Feature: refactor docker images (#510)

* add spark2.3.0 hadoop2.8.3 dockerfile

* start update to docker image

* add SPARK_DIST_CLASSPATH to bashrc, source .bashrc in docker run

* add maven install for jars

* docker image update and code fix

* add libthrift (still broken)

* start image refactor, build from source,

* add refactor to r base image

* finish refactor r image

* add storage jars and deps

* exclude netty to get rid of dependency conflict

* add miniconda image

* update 2.2.0 base, anaconda image

* remove unused cuda-8.0 image

* start pipenv implementation

* miniconda version arg

* update anaconda and miniconda image

* style

* pivot to virtualenv

* remove virtualenv from path when submitting apps

* flatten layers

* explicit calls to aztk python instead of activating virtualenv

* update base, miniconda, anaconda

* add compatibility version for base aztk images

* typo fix

* update pom

* update environment variable name

* update environment variables

* add anaconda images base & gpu

* update gpu and miniconda base images

* create venv in cluster create

* update base docker files, remove virtualenv

* fix path

* add exclusion to base images

* update r images

* delete python images (in favor of anaconda and miniconda)

* add miniconda gpu images

* update comment

* update aztk_version_compatibility to dokcer image version

* add a build script

* virutalenv->pipenv, add pipfile & pipfile.lock remove secretstorage

* aztk/staging->aztk/spark

* remove jars, add .null to keep directory

* update pipfile, update jupyter and jupyterlab

* update default images

* update base images to fix hdfs

* update build script with correct path

* add spark1.6.3 anaconda, miniconda, r base and gpu images

* update build script to include spark1.6.3

* mkdir out

* exclude commons lang and slf4j dependencies

* mkdir out

* no fail if dir exists

* update node_scripts

* update env var name

* update env var name

* fix the docker_repo docs

* master->0.7.0
This commit is contained in:
Jacob Freck 2018-04-30 17:19:01 -07:00 коммит произвёл GitHub
Родитель 47000a5c7d
Коммит 779bffb2da
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 4AEE18F83AFDEB23
68 изменённых файлов: 1806 добавлений и 1081 удалений

Просмотреть файл

@ -27,7 +27,7 @@ This toolkit is built on top of Azure Batch but does not require any Azure Batch
```
3. Login or register for an [Azure Account](https://azure.microsoft.com), navigate to [Azure Cloud Shell](https://shell.azure.com), and run:
```sh
wget -q https://raw.githubusercontent.com/Azure/aztk/master/account_setup.sh -O account_setup.sh &&
wget -q https://raw.githubusercontent.com/Azure/aztk/v0.7.0/account_setup.sh -O account_setup.sh &&
chmod 755 account_setup.sh &&
/bin/bash account_setup.sh
```

Просмотреть файл

@ -4,7 +4,7 @@ echo "Installing depdendencies..." &&
pip install --force-reinstall --upgrade --user pyyaml==3.12 azure==3.0.0 azure-cli-core==2.0.30 msrestazure==0.4.25 > /dev/null 2>&1 &&
echo "Finished installing depdencies." &&
echo "Getting account setup script..." &&
wget -q https://raw.githubusercontent.com/Azure/aztk/master/account_setup.py -O account_setup.py &&
wget -q https://raw.githubusercontent.com/Azure/aztk/v0.7.0/account_setup.py -O account_setup.py &&
chmod 755 account_setup.py &&
echo "Finished getting account setup script." &&
echo "Running account setup script..." &&

17
aztk/node_scripts/Pipfile Normal file
Просмотреть файл

@ -0,0 +1,17 @@
[[source]]
url = "https://pypi.python.org/simple"
verify_ssl = true
name = "pypi"
[packages]
azure-batch = "==4.1.3"
azure-mgmt-batch = "==5.0.0"
azure-mgmt-storage = "==1.5.0"
azure-storage-blob = "==1.1.0"
pycryptodome = "==3.4.7"
PyYAML = "==3.12"
[dev-packages]
[requires]
python_version = "3.5"

291
aztk/node_scripts/Pipfile.lock сгенерированный Normal file
Просмотреть файл

@ -0,0 +1,291 @@
{
"_meta": {
"hash": {
"sha256": "6ec054e45a39a75baeae8d6c48097a02a4d690c77a48d79a24c4a396b3799565"
},
"pipfile-spec": 6,
"requires": {
"python_version": "3.5"
},
"sources": [
{
"name": "pypi",
"url": "https://pypi.python.org/simple",
"verify_ssl": true
}
]
},
"default": {
"adal": {
"hashes": [
"sha256:83b746883f3bd7216664463af70c05e847abd8e5b259d91eb49d692bec519a24",
"sha256:dd3ecb2dfb2de9393320d0ed4e6115ed07a6984a28e18adf46499b91d3c3a494"
],
"version": "==0.5.1"
},
"asn1crypto": {
"hashes": [
"sha256:2f1adbb7546ed199e3c90ef23ec95c5cf3585bac7d11fb7eb562a3fe89c64e87",
"sha256:9d5c20441baf0cb60a4ac34cc447c6c189024b6b4c6cd7877034f4965c464e49"
],
"version": "==0.24.0"
},
"azure-batch": {
"hashes": [
"sha256:017be21a9e6db92473d2e33170d5dd445596fc70d706f73552ac9c6b57a6ef1c",
"sha256:cd71c7ebb5beab174b6225bbf79ae18d6db0c8d63227a7e514da0a75f138364c"
],
"index": "pypi",
"version": "==4.1.3"
},
"azure-common": {
"hashes": [
"sha256:4fdc3a6d94d7073a76e04d59435e279decb91022520550ef08f2b6f316b72563",
"sha256:5124ab76357452356164ef1a10e7fe69f686eaf1647ef57b37c2ede50df2cc02"
],
"version": "==1.1.9"
},
"azure-mgmt-batch": {
"hashes": [
"sha256:bc8ab35d21a07e17a4007efeb14a607a86315be5577d521fac53239f2270a633",
"sha256:e83988711449d1ad4fe3db5c88c2b08aede073b113f2c5b423af155b1bd5f944"
],
"index": "pypi",
"version": "==5.0.0"
},
"azure-mgmt-nspkg": {
"hashes": [
"sha256:0bd439a8e9529387246c3e335920d6474fb67e12f963e4a40bec54933b347220",
"sha256:e36488d4f5d7d668ef5cc3e6e86f081448fd60c9bf4e051d06ff7cfc5a653e6f"
],
"version": "==2.0.0"
},
"azure-mgmt-storage": {
"hashes": [
"sha256:b1fc3a293051dee35dffe12d618f925581d6536c94ca5c05b69461ce941125a1",
"sha256:d7a60f0675d49f70e74927814e0f1112e6482073c31a95478a55f5bb6e0691db"
],
"index": "pypi",
"version": "==1.5.0"
},
"azure-nspkg": {
"hashes": [
"sha256:4bd758e649f57cc188db4f3c64becaca16195e057e4362b6caad56fe1e7934e9",
"sha256:fe19ee5d8c66ee8ef62557fc7310f59cffb7230f0a94701eef79f6e3191fdc7b"
],
"version": "==2.0.0"
},
"azure-storage-blob": {
"hashes": [
"sha256:4fdcdc20e36d0f97a58bdffe1b26fc2b8b983c59ff8625e961c188c925891c66",
"sha256:71d08a195a8cc732cbc0a45a552c7c8d495a2ef3721cbc993d0e586d0493d529"
],
"index": "pypi",
"version": "==1.1.0"
},
"azure-storage-common": {
"hashes": [
"sha256:2aad9fdaa6052867f19515a5d0acaa650103532cc50a8a8974b0d76e485525a0",
"sha256:8c67a4b0ad9ef16c4da3ca050ac7ad2117818797365d7e3bb4f371bdb78040cf"
],
"version": "==1.1.0"
},
"azure-storage-nspkg": {
"hashes": [
"sha256:4fc4685aef941eab2f7fb53824254cca2e38f2a1bf33cda0c8ae654fe15827d6",
"sha256:855315c038c0e695868025127e1b3057a1f984af9ccfbaeac4fbfd6c5dd3b466"
],
"version": "==3.0.0"
},
"certifi": {
"hashes": [
"sha256:13e698f54293db9f89122b0581843a782ad0934a4fe0172d2a980ba77fc61bb7",
"sha256:9fa520c1bacfb634fa7af20a76bcbd3d5fb390481724c597da32c719a7dca4b0"
],
"version": "==2018.4.16"
},
"cffi": {
"hashes": [
"sha256:151b7eefd035c56b2b2e1eb9963c90c6302dc15fbd8c1c0a83a163ff2c7d7743",
"sha256:1553d1e99f035ace1c0544050622b7bc963374a00c467edafac50ad7bd276aef",
"sha256:1b0493c091a1898f1136e3f4f991a784437fac3673780ff9de3bcf46c80b6b50",
"sha256:2ba8a45822b7aee805ab49abfe7eec16b90587f7f26df20c71dd89e45a97076f",
"sha256:3c85641778460581c42924384f5e68076d724ceac0f267d66c757f7535069c93",
"sha256:3eb6434197633b7748cea30bf0ba9f66727cdce45117a712b29a443943733257",
"sha256:4c91af6e967c2015729d3e69c2e51d92f9898c330d6a851bf8f121236f3defd3",
"sha256:770f3782b31f50b68627e22f91cb182c48c47c02eb405fd689472aa7b7aa16dc",
"sha256:79f9b6f7c46ae1f8ded75f68cf8ad50e5729ed4d590c74840471fc2823457d04",
"sha256:7a33145e04d44ce95bcd71e522b478d282ad0eafaf34fe1ec5bbd73e662f22b6",
"sha256:857959354ae3a6fa3da6651b966d13b0a8bed6bbc87a0de7b38a549db1d2a359",
"sha256:87f37fe5130574ff76c17cab61e7d2538a16f843bb7bca8ebbc4b12de3078596",
"sha256:95d5251e4b5ca00061f9d9f3d6fe537247e145a8524ae9fd30a2f8fbce993b5b",
"sha256:9d1d3e63a4afdc29bd76ce6aa9d58c771cd1599fbba8cf5057e7860b203710dd",
"sha256:a36c5c154f9d42ec176e6e620cb0dd275744aa1d804786a71ac37dc3661a5e95",
"sha256:ae5e35a2c189d397b91034642cb0eab0e346f776ec2eb44a49a459e6615d6e2e",
"sha256:b0f7d4a3df8f06cf49f9f121bead236e328074de6449866515cea4907bbc63d6",
"sha256:b75110fb114fa366b29a027d0c9be3709579602ae111ff61674d28c93606acca",
"sha256:ba5e697569f84b13640c9e193170e89c13c6244c24400fc57e88724ef610cd31",
"sha256:be2a9b390f77fd7676d80bc3cdc4f8edb940d8c198ed2d8c0be1319018c778e1",
"sha256:d5d8555d9bfc3f02385c1c37e9f998e2011f0db4f90e250e5bc0c0a85a813085",
"sha256:e55e22ac0a30023426564b1059b035973ec82186ddddbac867078435801c7801",
"sha256:e90f17980e6ab0f3c2f3730e56d1fe9bcba1891eeea58966e89d352492cc74f4",
"sha256:ecbb7b01409e9b782df5ded849c178a0aa7c906cf8c5a67368047daab282b184",
"sha256:ed01918d545a38998bfa5902c7c00e0fee90e957ce036a4000a88e3fe2264917",
"sha256:edabd457cd23a02965166026fd9bfd196f4324fe6032e866d0f3bd0301cd486f",
"sha256:fdf1c1dc5bafc32bc5d08b054f94d659422b05aba244d6be4ddc1c72d9aa70fb"
],
"markers": "platform_python_implementation != 'pypy'",
"version": "==1.11.5"
},
"chardet": {
"hashes": [
"sha256:84ab92ed1c4d4f16916e05906b6b75a6c0fb5db821cc65e70cbd64a3e2a5eaae",
"sha256:fc323ffcaeaed0e0a02bf4d117757b98aed530d9ed4531e3e15460124c106691"
],
"version": "==3.0.4"
},
"cryptography": {
"hashes": [
"sha256:3f3b65d5a16e6b52fba63dc860b62ca9832f51f1a2ae5083c78b6840275f12dd",
"sha256:551a3abfe0c8c6833df4192a63371aa2ff43afd8f570ed345d31f251d78e7e04",
"sha256:5cb990056b7cadcca26813311187ad751ea644712022a3976443691168781b6f",
"sha256:60bda7f12ecb828358be53095fc9c6edda7de8f1ef571f96c00b2363643fa3cd",
"sha256:6fef51ec447fe9f8351894024e94736862900d3a9aa2961528e602eb65c92bdb",
"sha256:77d0ad229d47a6e0272d00f6bf8ac06ce14715a9fd02c9a97f5a2869aab3ccb2",
"sha256:808fe471b1a6b777f026f7dc7bd9a4959da4bfab64972f2bbe91e22527c1c037",
"sha256:9b62fb4d18529c84b961efd9187fecbb48e89aa1a0f9f4161c61b7fc42a101bd",
"sha256:9e5bed45ec6b4f828866ac6a6bedf08388ffcfa68abe9e94b34bb40977aba531",
"sha256:9fc295bf69130a342e7a19a39d7bbeb15c0bcaabc7382ec33ef3b2b7d18d2f63",
"sha256:abd070b5849ed64e6d349199bef955ee0ad99aefbad792f0c587f8effa681a5e",
"sha256:ba6a774749b6e510cffc2fb98535f717e0e5fd91c7c99a61d223293df79ab351",
"sha256:c332118647f084c983c6a3e1dba0f3bcb051f69d12baccac68db8d62d177eb8a",
"sha256:d6f46e862ee36df81e6342c2177ba84e70f722d9dc9c6c394f9f1f434c4a5563",
"sha256:db6013746f73bf8edd9c3d1d3f94db635b9422f503db3fc5ef105233d4c011ab",
"sha256:f57008eaff597c69cf692c3518f6d4800f0309253bb138b526a37fe9ef0c7471",
"sha256:f6c821ac253c19f2ad4c8691633ae1d1a17f120d5b01ea1d256d7b602bc59887"
],
"version": "==2.2.2"
},
"idna": {
"hashes": [
"sha256:2c6a5de3089009e3da7c5dde64a141dbc8551d5b7f6cf4ed7c2568d0cc520a8f",
"sha256:8c7309c718f94b3a625cb648ace320157ad16ff131ae0af362c9f21b80ef6ec4"
],
"version": "==2.6"
},
"isodate": {
"hashes": [
"sha256:2e364a3d5759479cdb2d37cce6b9376ea504db2ff90252a2e5b7cc89cc9ff2d8",
"sha256:aa4d33c06640f5352aca96e4b81afd8ab3b47337cc12089822d6f322ac772c81"
],
"version": "==0.6.0"
},
"msrest": {
"hashes": [
"sha256:2920c4eee294a901a59480c72e70092ebbac4849bc2237e064cb9feed174deeb",
"sha256:65bdde2ea8aa3312eb4ce6142d5da65d455f561a7676eee678c1a6e00416f5a0"
],
"version": "==0.4.28"
},
"msrestazure": {
"hashes": [
"sha256:4e336150730f9a512f1432c4e0c5293d618ffcbf92767c07525bd8a8200fa9d5",
"sha256:5b33886aaaf068acec17d76127d95290c9eaca7942711184da991cabd3929854"
],
"version": "==0.4.28"
},
"oauthlib": {
"hashes": [
"sha256:09d438bcac8f004ae348e721e9d8a7792a9e23cd574634e973173344046287f5",
"sha256:909665297635fa11fe9914c146d875f2ed41c8c2d78e21a529dd71c0ba756508"
],
"version": "==2.0.7"
},
"pycparser": {
"hashes": [
"sha256:99a8ca03e29851d96616ad0404b4aad7d9ee16f25c9f9708a11faf2810f7b226"
],
"version": "==2.18"
},
"pycryptodome": {
"hashes": [
"sha256:15ced95a00b55bb2fc22f3dddde1c8d6f270089f35c3af0e07306bc2ba1e1c4e",
"sha256:18d8dfe31bf0cb53d58694903e526be68f3cf48e6e3c6dfbbc1e7042b1693af7",
"sha256:2174fa555916b5ae8bcc7747ecfe2a4d5943b42c9dcf4878e269baaae264e85d",
"sha256:6f64d8b63034fd9289bae4cb48aa8f7049f6b8db702c7af50cb3718821d28147",
"sha256:8440a35ccd52f0eab0f4ece284bd13a587d86d79bd404d8914f81eda74a66de1",
"sha256:8851b1e1d85e4fb981048c8a8a8431839103f43ea3c35f1b46bae2e41699f439",
"sha256:9fc97cd0f6eeec59af736b3df81e5811d836fa646b89a4325672dcaf997250b3",
"sha256:a9e3e3e9ab0241b0303206656a74d5cd6bd00fcad6f9ffd0ba6b8e35072f74d7",
"sha256:ec560e62258358afd7a1a3d34c8860fdf478e28c0999173f2d5c618fd2fd60d3",
"sha256:f0196124f83221f9c5e06a68e247019466395d35d92d4ce4482c835f75302851",
"sha256:f7befe2249df41e012a3d8079ab3c7089be21969591eb77b21767fa24557a7b7"
],
"index": "pypi",
"version": "==3.4.7"
},
"pyjwt": {
"hashes": [
"sha256:bca523ef95586d3a8a5be2da766fe6f82754acba27689c984e28e77a12174593",
"sha256:dacba5786fe3bf1a0ae8673874e29f9ac497860955c501289c63b15d3daae63a"
],
"version": "==1.6.1"
},
"python-dateutil": {
"hashes": [
"sha256:3220490fb9741e2342e1cf29a503394fdac874bc39568288717ee67047ff29df",
"sha256:9d8074be4c993fbe4947878ce593052f71dac82932a677d49194d8ce9778002e"
],
"version": "==2.7.2"
},
"pyyaml": {
"hashes": [
"sha256:0c507b7f74b3d2dd4d1322ec8a94794927305ab4cebbe89cc47fe5e81541e6e8",
"sha256:16b20e970597e051997d90dc2cddc713a2876c47e3d92d59ee198700c5427736",
"sha256:3262c96a1ca437e7e4763e2843746588a965426550f3797a79fca9c6199c431f",
"sha256:326420cbb492172dec84b0f65c80942de6cedb5233c413dd824483989c000608",
"sha256:4474f8ea030b5127225b8894d626bb66c01cda098d47a2b0d3429b6700af9fd8",
"sha256:592766c6303207a20efc445587778322d7f73b161bd994f227adaa341ba212ab",
"sha256:5ac82e411044fb129bae5cfbeb3ba626acb2af31a8d17d175004b70862a741a7",
"sha256:5f84523c076ad14ff5e6c037fe1c89a7f73a3e04cf0377cb4d017014976433f3",
"sha256:827dc04b8fa7d07c44de11fabbc888e627fa8293b695e0f99cb544fdfa1bf0d1",
"sha256:b4c423ab23291d3945ac61346feeb9a0dc4184999ede5e7c43e1ffb975130ae6",
"sha256:bc6bced57f826ca7cb5125a10b23fd0f2fff3b7c4701d64c439a300ce665fff8",
"sha256:c01b880ec30b5a6e6aa67b09a2fe3fb30473008c85cd6a67359a1b15ed6d83a4",
"sha256:ca233c64c6e40eaa6c66ef97058cdc80e8d0157a443655baa1b2966e812807ca",
"sha256:e863072cdf4c72eebf179342c94e6989c67185842d9997960b3e69290b2fa269"
],
"index": "pypi",
"version": "==3.12"
},
"requests": {
"hashes": [
"sha256:6a1b267aa90cac58ac3a765d067950e7dbbf75b1da07e895d1f594193a40a38b",
"sha256:9c443e7324ba5b85070c4a818ade28bfabedf16ea10206da1132edaa6dda237e"
],
"version": "==2.18.4"
},
"requests-oauthlib": {
"hashes": [
"sha256:50a8ae2ce8273e384895972b56193c7409601a66d4975774c60c2aed869639ca",
"sha256:883ac416757eada6d3d07054ec7092ac21c7f35cb1d2cf82faf205637081f468"
],
"version": "==0.8.0"
},
"six": {
"hashes": [
"sha256:70e8a77beed4562e7f14fe23a786b54f6296e34344c23bc42f07b15018ff98e9",
"sha256:832dc0e10feb1aa2c68dcc57dbb658f1c7e65b9b61af69048abc87a2db00a0eb"
],
"version": "==1.11.0"
},
"urllib3": {
"hashes": [
"sha256:06330f386d6e4b195fbfc736b297f58c5a892e4440e54d294d7004e3a9bbea1b",
"sha256:cc44da8e1145637334317feebd728bd869a35285b93cbb4cca2577da7e62db4f"
],
"version": "==1.22"
}
},
"develop": {}
}

Просмотреть файл

@ -3,6 +3,7 @@
# This file is the entry point of the docker container.
set -e
source ~/.bashrc
echo "Initializing spark container"
# --------------------
@ -25,15 +26,14 @@ done
# ----------------------------
# Run aztk setup python scripts
# ----------------------------
# use python v3.5.4 to run aztk software
# setup docker container
echo "Starting setup using Docker"
$(pyenv root)/versions/$AZTK_PYTHON_VERSION/bin/pip install -r $(dirname $0)/requirements.txt
export PYTHONPATH=$PYTHONPATH:$AZTK_WORKING_DIR
echo 'export PYTHONPATH=$PYTHONPATH:$AZTK_WORKING_DIR' >> ~/.bashrc
echo "Running main.py script"
$(pyenv root)/versions/$AZTK_PYTHON_VERSION/bin/python $(dirname $0)/main.py setup-spark-container
$AZTK_WORKING_DIR/.aztk-env/.venv/bin/python $(dirname $0)/main.py setup-spark-container
# sleep to keep container running
while true; do sleep 1; done

Просмотреть файл

@ -4,3 +4,4 @@ azure-mgmt-storage==1.5.0
azure-storage-blob==1.1.0
pyyaml==3.12
pycryptodome==3.4.7

Просмотреть файл

@ -11,12 +11,13 @@ container_name=$1
docker_repo_name=$2
echo "Installing pre-reqs"
apt-get -y install linux-image-extra-$(uname -r) linux-image-extra-virtual
apt-get -y install apt-transport-https
apt-get -y install curl
apt-get -y install ca-certificates
apt-get -y install software-properties-common
apt-get -y install python3-pip python-dev build-essential libssl-dev
apt-get -y update
apt-get install -y --no-install-recommends linux-image-extra-$(uname -r) linux-image-extra-virtual
apt-get install -y --no-install-recommends apt-transport-https
apt-get install -y --no-install-recommends curl
apt-get install -y --no-install-recommends ca-certificates
apt-get install -y --no-install-recommends software-properties-common
apt-get install -y --no-install-recommends python3-pip python3-venv python-dev build-essential libssl-dev
echo "Done installing pre-reqs"
# Install docker
@ -78,12 +79,25 @@ else
echo "Node python version:"
python3 --version
# set up aztk python environment
export LC_ALL=C.UTF-8
export LANG=C.UTF-8
python3 -m pip install pipenv
mkdir -p $AZTK_WORKING_DIR/.aztk-env
cp $AZTK_WORKING_DIR/aztk/node_scripts/Pipfile $AZTK_WORKING_DIR/.aztk-env
cp $AZTK_WORKING_DIR/aztk/node_scripts/Pipfile.lock $AZTK_WORKING_DIR/.aztk-env
cd $AZTK_WORKING_DIR/.aztk-env
export PIPENV_VENV_IN_PROJECT=true
pipenv install --python /usr/bin/python3.5m
pipenv run pip install --upgrade setuptools wheel #TODO: add pip when pipenv is compatible with pip10
# Install python dependencies
pip3 install -r $(dirname $0)/requirements.txt
$AZTK_WORKING_DIR/.aztk-env/.venv/bin/pip install -r $(dirname $0)/requirements.txt
export PYTHONPATH=$PYTHONPATH:$AZTK_WORKING_DIR
echo "Running setup python script"
python3 $(dirname $0)/main.py setup-node $docker_repo_name
$AZTK_WORKING_DIR/.aztk-env/.venv/bin/python $(dirname $0)/main.py setup-node $docker_repo_name
# wait until container is running
until [ "`/usr/bin/docker inspect -f {{.State.Running}} $container_name`"=="true" ]; do
@ -94,7 +108,7 @@ else
# wait until container setup is complete
echo "Waiting for spark docker container to setup."
docker exec spark /bin/bash -c 'python $AZTK_WORKING_DIR/aztk/node_scripts/wait_until_setup_complete.py'
docker exec spark /bin/bash -c '$AZTK_WORKING_DIR/.aztk-env/.venv/bin/python $AZTK_WORKING_DIR/aztk/node_scripts/wait_until_setup_complete.py'
# Setup symbolic link for the docker logs
docker_log=$(docker inspect --format='{{.LogPath}}' $container_name)

Просмотреть файл

@ -101,7 +101,7 @@ def __cluster_install_cmd(zip_resource_file: batch_models.ResourceFile,
'apt-get -y update',
'apt-get install --fix-missing',
'apt-get -y install unzip',
'unzip $AZ_BATCH_TASK_WORKING_DIR/{0}'.format(
'unzip -o $AZ_BATCH_TASK_WORKING_DIR/{0}'.format(
zip_resource_file.file_path),
'chmod 777 $AZ_BATCH_TASK_WORKING_DIR/aztk/node_scripts/setup_host.sh',
'/bin/bash $AZ_BATCH_TASK_WORKING_DIR/aztk/node_scripts/setup_host.sh {0} {1}'.format(

Просмотреть файл

@ -19,9 +19,10 @@ def __app_cmd():
docker_exec.add_argument("-i")
docker_exec.add_option("-e", "AZ_BATCH_TASK_WORKING_DIR=$AZ_BATCH_TASK_WORKING_DIR")
docker_exec.add_option("-e", "AZ_BATCH_JOB_ID=$AZ_BATCH_JOB_ID")
docker_exec.add_argument("spark /bin/bash >> output.log 2>&1 -c \""\
"source ~/.bashrc; "\
"python \$AZTK_WORKING_DIR/aztk/node_scripts/job_submission.py\"")
docker_exec.add_argument("spark /bin/bash >> output.log 2>&1 -c \"" \
"source ~/.bashrc; " \
"export PYTHONPATH=$PYTHONPATH:\$AZTK_WORKING_DIR; " \
"$AZTK_WORKING_DIR/.aztk-env/.venv/bin/python \$AZTK_WORKING_DIR/aztk/node_scripts/job_submission.py\"")
return docker_exec.to_str()

Просмотреть файл

@ -82,10 +82,10 @@ def generate_task(spark_client, container_id, application):
task_cmd.add_option('-e', 'AZ_BATCH_TASK_WORKING_DIR=$AZ_BATCH_TASK_WORKING_DIR')
task_cmd.add_option('-e', 'STORAGE_LOGS_CONTAINER={0}'.format(container_id))
task_cmd.add_argument('spark /bin/bash >> output.log 2>&1')
task_cmd.add_argument('-c "source ~/.bashrc; '\
task_cmd.add_argument('-c "source ~/.bashrc; ' \
'export PYTHONPATH=$PYTHONPATH:\$AZTK_WORKING_DIR; ' \
'cd $AZ_BATCH_TASK_WORKING_DIR; ' \
'\$(pyenv root)/versions/\$AZTK_PYTHON_VERSION/bin/python ' \
'\$AZTK_WORKING_DIR/aztk/node_scripts/submit.py"')
'\$AZTK_WORKING_DIR/.aztk-env/.venv/bin/python \$AZTK_WORKING_DIR/aztk/node_scripts/submit.py"')
# Create task
task = batch_models.TaskAddParameter(

Просмотреть файл

@ -13,13 +13,13 @@ echo "Is master: $AZTK_IS_MASTER"
if [ "$AZTK_IS_MASTER" = "true" ]; then
pip install jupyter --upgrade
pip install notebook --upgrade
PYSPARK_DRIVER_PYTHON="/.pyenv/versions/${USER_PYTHON_VERSION}/bin/jupyter"
JUPYTER_KERNELS="/.pyenv/versions/${USER_PYTHON_VERSION}/share/jupyter/kernels"
PYSPARK_DRIVER_PYTHON="/opt/conda/bin/jupyter"
JUPYTER_KERNELS="/opt/conda/share/jupyter/kernels"
# disable password/token on jupyter notebook
jupyter notebook --generate-config --allow-root
JUPYTER_CONFIG='/.jupyter/jupyter_notebook_config.py'
JUPYTER_CONFIG='/root/.jupyter/jupyter_notebook_config.py'
echo >> $JUPYTER_CONFIG
echo -e 'c.NotebookApp.token=""' >> $JUPYTER_CONFIG
echo -e 'c.NotebookApp.password=""' >> $JUPYTER_CONFIG

Просмотреть файл

@ -9,12 +9,12 @@
if [ "$AZTK_IS_MASTER" = "true" ]; then
conda install -c conda-force jupyterlab
PYSPARK_DRIVER_PYTHON="/.pyenv/versions/${USER_PYTHON_VERSION}/bin/jupyter"
JUPYTER_KERNELS="/.pyenv/versions/${USER_PYTHON_VERSION}/share/jupyter/kernels"
PYSPARK_DRIVER_PYTHON="/opt/conda/bin/jupyter"
JUPYTER_KERNELS="/opt/conda/share/jupyter/kernels"
# disable password/token on jupyter notebook
jupyter lab --generate-config --allow-root
JUPYTER_CONFIG='/.jupyter/jupyter_notebook_config.py'
JUPYTER_CONFIG='/root/.jupyter/jupyter_notebook_config.py'
echo >> $JUPYTER_CONFIG
echo -e 'c.NotebookApp.token=""' >> $JUPYTER_CONFIG
echo -e 'c.NotebookApp.password=""' >> $JUPYTER_CONFIG

Просмотреть файл

@ -11,6 +11,7 @@ if [ "$AZTK_IS_MASTER" = "true" ]; then
## Download and install Rstudio Server
wget https://download2.rstudio.org/rstudio-server-$RSTUDIO_SERVER_VERSION-amd64.deb
apt-get install -y --no-install-recommends gdebi-core
gdebi rstudio-server-$RSTUDIO_SERVER_VERSION-amd64.deb --non-interactive
echo "server-app-armor-enabled=0" | tee -a /etc/rstudio/rserver.conf
rm rstudio-server-$RSTUDIO_SERVER_VERSION-amd64.deb

Просмотреть файл

@ -1,2 +1,2 @@
#!/bin/bash
python $DOCKER_WORKING_DIR/plugins/spark_ui_proxy/spark_ui_proxy.py $1 $2 &
python $AZTK_WORKING_DIR/plugins/spark_ui_proxy/spark_ui_proxy.py $1 $2 &

Просмотреть файл

@ -2,10 +2,10 @@ import os
"""
DOCKER
"""
DEFAULT_DOCKER_REPO = "aztk/base:latest"
DEFAULT_DOCKER_REPO_GPU = "aztk/gpu:latest"
DEFAULT_SPARK_PYTHON_DOCKER_REPO = "aztk/python:latest"
DEFAULT_SPARK_R_BASE_DOCKER_REPO = "aztk/r-base:latest"
DEFAULT_DOCKER_REPO = "aztk/spark:v0.1.0-spark2.3.0-base"
DEFAULT_DOCKER_REPO_GPU = "aztk/spark:v0.1.0-spark2.3.0-gpu"
DEFAULT_SPARK_PYTHON_DOCKER_REPO = "aztk/spark:v0.1.0-spark2.3.0-miniconda-base"
DEFAULT_SPARK_R_BASE_DOCKER_REPO = "aztk/spark:v0.1.0-spark2.3.0-r-base"
DOCKER_SPARK_CONTAINER_NAME = "spark"
# DOCKER SPARK

Просмотреть файл

@ -14,7 +14,7 @@ size: 2
# username: <username for the linux user to be created> (optional)
username: spark
# docker_repo: <name of docker image repo (for more information, see https://github.com/Azure/aztk/blob/master/docs/12-docker-image.md)>
# docker_repo: <name of docker image repo (for more information, see https://github.com/Azure/aztk/blob/v0.7.0/docs/12-docker-image.md)>
docker_repo:
# # optional custom scripts to run on the Spark master, Spark worker or all nodes in the cluster

Просмотреть файл

Двоичный файл не отображается.

Двоичные данные
aztk_cli/config/jars/azure-storage-2.0.0.jar

Двоичный файл не отображается.

Двоичные данные
aztk_cli/config/jars/hadoop-azure-2.7.3.jar

Двоичный файл не отображается.

Двоичный файл не отображается.

Просмотреть файл

@ -1,7 +1,7 @@
# Job Configuration
# An Aztk Job is a cluster and an array of Spark applications to run on that cluster
# AZTK Spark Jobs will automatically manage the lifecycle of the cluster
# For more information see the documentation at: https://github.com/Azure/aztk/blob/master/docs/70-jobs.md
# For more information see the documentation at: https://github.com/Azure/aztk/blob/v0.7.0/docs/70-jobs.md
job:
id:

Просмотреть файл

@ -1,5 +1,5 @@
# For instructions on creating a Batch and Storage account, see
# Getting Started (https://github.com/Azure/aztk/blob/master/docs/00-getting-started.md)
# Getting Started (https://github.com/Azure/aztk/blob/v0.7.0/docs/00-getting-started.md)
# NOTE - YAML requires a space after the colon. Ex: "batchaccountname: mybatchaccount"
service_principal:

Просмотреть файл

@ -12,12 +12,12 @@ if [ "$AZTK_IS_MASTER" = "true" ]; then
pip install jupyter --upgrade
pip install notebook --upgrade
PYSPARK_DRIVER_PYTHON="/.pyenv/versions/${USER_PYTHON_VERSION}/bin/jupyter"
JUPYTER_KERNELS="/.pyenv/versions/${USER_PYTHON_VERSION}/share/jupyter/kernels"
PYSPARK_DRIVER_PYTHON="/opt/conda/bin/jupyter"
JUPYTER_KERNELS="/opt/conda/share/jupyter/kernels"
# disable password/token on jupyter notebook
jupyter notebook --generate-config --allow-root
JUPYTER_CONFIG='/.jupyter/jupyter_notebook_config.py'
JUPYTER_CONFIG='/root/.jupyter/jupyter_notebook_config.py'
echo >> $JUPYTER_CONFIG
echo -e 'c.NotebookApp.token=""' >> $JUPYTER_CONFIG
echo -e 'c.NotebookApp.password=""' >> $JUPYTER_CONFIG

Просмотреть файл

@ -2,114 +2,3 @@
Azure Distributed Data Engineering Toolkit uses Docker containers to run Spark.
Please refer to the docs for details on [how to select a docker-repo at cluster creation time](../docs/12-docker-image.md).
## Supported Images
By default, this toolkit will use the base Spark image, __aztk/base__. This image contains the bare mininum to get Spark up and running in standalone mode.
On top of that, we also provide additional flavors of Spark images, one geared towards the Python user (PySpark), and the other, geared towards the R user (SparklyR or SparkR).
Docker Image | Image Type | User Language(s) | What's Included?
:-- | :-- | :-- | :--
[aztk/base](https://hub.docker.com/r/aztk/base/) | Base | Java, Scala | `Spark`
[aztk/python](https://hub.docker.com/r/aztk/python/) | Pyspark | Python | `Anaconda`</br>`Jupyter Notebooks` </br> `PySpark`
[aztk/r-base](https://hub.docker.com/r/aztk/r-base/) | SparklyR | R | `CRAN`</br>`RStudio Server`</br>`SparklyR and SparkR`
__aztk/gpu__, __aztk/python__ and __aztk/r-base__ images are built on top of the __aztk/base__ image.
All the AZTK images are hosted on Docker Hub under [aztk](https://hub.docker.com/r/aztk).
### Matrix of Supported Container Images:
Docker Repo (hosted on Docker Hub) | Spark Version | Python Version | R Version | CUDA Version | cudNN Version
:-- | :-- | :-- | :-- | :-- | :--
aztk/base:spark2.2.0 __(default)__ | v2.2.0 | -- | -- | -- | --
aztk/base:spark2.1.0 | v2.1.0 | -- | -- | -- | --
aztk/base:spark1.6.3 | v1.6.3 | -- | -- | -- | --
aztk/gpu:spark2.2.0 | v2.2.0 | -- | -- | 8.0 | 6.0
aztk/gpu:spark2.1.0 | v2.1.0 | -- | -- | 8.0 | 6.0
aztk/gpu:spark1.6.3 | v1.6.3 | -- | -- | 8.0 | 6.0
aztk/python:spark2.2.0-python3.6.2-base | v2.2.0 | v3.6.2 | -- | -- | -- | --
aztk/python:spark2.1.0-python3.6.2-base | v2.1.0 | v3.6.2 | -- | -- | -- | --
aztk/python:spark1.6.3-python3.6.2-base | v1.6.3 | v3.6.2 | -- | -- | -- | --
aztk/python:spark2.2.0-python3.6.2-gpu | v2.2.0 | v3.6.2 | -- | 8.0 | 6.0
aztk/python:spark2.1.0-python3.6.2-gpu | v2.1.0 | v3.6.2 | -- | 8.0 | 6.0
aztk/python:spark1.6.3-python3.6.2-gpu | v1.6.3 | v3.6.2 | -- | 8.0 | 6.0
aztk/r-base:spark2.2.0-r3.4.1-base | v2.2.0 | -- | v3.4.1 | -- | --
aztk/r-base:spark2.1.0-r3.4.1-base | v2.1.0 | -- | v3.4.1 | -- | --
aztk/r-base:spark1.6.3-r3.4.1-base | v1.6.3 | -- | v3.4.1 | -- | --
If you have requests to add to the list of supported images, please file a Github issue.
NOTE: Spark clusters that use the __aztk/gpu__, __aztk/python__ or __aztk/r-base__ images take longer to provision because these Docker images are significantly larger than the __aztk/base__ image.
### Gallery of 3rd Party Images
Since this toolkit uses Docker containers to run Spark, users can bring their own images. Here's a list of 3rd party images:
- *coming soon*
(See below for a how-to guide on building your own images for the Azure Distributed Data Engineering Toolkit)
# How do I use my own Docker Image?
Building your own Docker Image to use with this toolkit has many advantages for users who want more customization over their environment. For some, this may look like installing specific, and even private, libraries that their Spark jobs require. For others, it may just be setting up a version of Spark, Python or R that fits their particular needs.
This section is for users who want to build their own docker images.
## Building Your Own Docker Image
The Azure Distributed Data Engineering Toolkit supports custom Docker images. To guarantee that your Spark deployment works, we recommend that you build on top of one of our __aztk/base__ images. You can also build on top of our __aztk/python__ or __aztk/r-base__ images, but note that they are also built on top of the __aztk_base__ image.
To build your own image, can either build _on top_ or _beneath_ one of our supported images _OR_ you can just modify one of the supported Dockerfiles to build your own.
### Building on top
You can build on top of our images by referencing the __aztk/base__ image in the **FROM** keyword of your Dockerfile:
```sh
# Your custom Dockerfile
FROM aztk/base:spark2.2.0
...
```
### Building beneath
To build beneath one of our images, modify one of our Dockerfiles so that the **FROM** keyword pulls from your Docker image's location (as opposed to the default which is a base Ubuntu image):
```sh
# One of the Dockerfiles that AZTK supports
# Change the FROM statement to point to your hosted image repo
FROM my_username/my_repo:latest
...
```
Please note that for this method to work, your Docker image must have been built on Ubuntu.
## Required Environment Variables
When layering your own Docker image, make sure your image does not intefere with the environment variables set in the __aztk_base__ Dockerfile, otherwise it may not work on AZTK.
Please make sure that the following environment variables are set:
- AZTK_PYTHON_VERSION
- JAVA_HOME
- SPARK_HOME
You also need to make sure that __PATH__ is correctly configured with $SPARK_HOME
- PATH=$SPARK_HOME/bin:$PATH
By default, these are set as follows:
``` sh
ENV AZTK_PYTHON_VERSION 3.5.4
ENV JAVA_HOME /usr/lib/jvm/java-1.8.0-openjdk-amd64
ENV SPARK_HOME /home/spark-current
ENV PATH $SPARK_HOME/bin:$PATH
```
If you are using your own version of Spark, make that it is symlinked by "/home/spark-current". **$SPARK_HOME**, must also point to "/home/spark-current".
## Hosting your Docker Image
By default, this toolkit assumes that your Docker images are publicly hosted on Docker Hub. However, we also support hosting your images privately.
See [here](https://github.com/Azure/aztk/blob/master/docs/12-docker-image.md#using-a-custom-docker-image-that-is-privately-hosted) to learn more about using privately hosted Docker Images.
## Learn More
The Dockerfiles in this directory are used to build the Docker images used by this toolkit. Please reference the individual directories for more information on each Dockerfile:
- [Base](./base)
- [Python](./python)
- [R](./r)

Просмотреть файл

@ -0,0 +1,23 @@
FROM aztk/spark:v0.1.0-spark1.6.3-base
ARG ANACONDA_VERSION=Anaconda3-5.1.0
ENV PATH /opt/conda/bin:$PATH
ENV LANG=C.UTF-8 LC_ALL=C.UTF-8
RUN apt-get update --fix-missing && apt-get install -y wget bzip2 ca-certificates \
libglib2.0-0 libxext6 libsm6 libxrender1 \
git mercurial subversion \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
RUN wget --quiet https://repo.continuum.io/archive/${ANACONDA_VERSION}-Linux-x86_64.sh -O ~/anaconda.sh \
&& /bin/bash ~/anaconda.sh -b -p /opt/conda \
&& rm ~/anaconda.sh \
&& ln -s /opt/conda/etc/profile.d/conda.sh /etc/profile.d/conda.sh \
&& echo ". /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc \
# reset default python to 3.5
&& rm /usr/bin/python \
&& ln -s /usr/bin/python3.5 /usr/bin/python
CMD ["/bin/bash"]

Просмотреть файл

@ -0,0 +1,23 @@
FROM aztk/spark:v0.1.0-spark1.6.3-gpu
ARG ANACONDA_VERSION=Anaconda3-5.1.0
ENV PATH /opt/conda/bin:$PATH
ENV LANG=C.UTF-8 LC_ALL=C.UTF-8
RUN apt-get update --fix-missing && apt-get install -y wget bzip2 ca-certificates \
libglib2.0-0 libxext6 libsm6 libxrender1 \
git mercurial subversion \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
RUN wget --quiet https://repo.continuum.io/archive/${ANACONDA_VERSION}-Linux-x86_64.sh -O ~/anaconda.sh \
&& /bin/bash ~/anaconda.sh -b -p /opt/conda \
&& rm ~/anaconda.sh \
&& ln -s /opt/conda/etc/profile.d/conda.sh /etc/profile.d/conda.sh \
&& echo ". /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc \
# reset default python to 3.5
&& rm /usr/bin/python \
&& ln -s /usr/bin/python3.5 /usr/bin/python
CMD ["/bin/bash"]

Просмотреть файл

@ -0,0 +1,23 @@
FROM aztk/spark:v0.1.0-spark2.1.0-base
ARG ANACONDA_VERSION=Anaconda3-5.1.0
ENV PATH /opt/conda/bin:$PATH
ENV LANG=C.UTF-8 LC_ALL=C.UTF-8
RUN apt-get update --fix-missing && apt-get install -y wget bzip2 ca-certificates \
libglib2.0-0 libxext6 libsm6 libxrender1 \
git mercurial subversion \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
RUN wget --quiet https://repo.continuum.io/archive/${ANACONDA_VERSION}-Linux-x86_64.sh -O ~/anaconda.sh \
&& /bin/bash ~/anaconda.sh -b -p /opt/conda \
&& rm ~/anaconda.sh \
&& ln -s /opt/conda/etc/profile.d/conda.sh /etc/profile.d/conda.sh \
&& echo ". /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc \
# reset default python to 3.5
&& rm /usr/bin/python \
&& ln -s /usr/bin/python3.5 /usr/bin/python
CMD ["/bin/bash"]

Просмотреть файл

@ -0,0 +1,23 @@
FROM aztk/spark:v0.1.0-spark2.1.0-gpu
ARG ANACONDA_VERSION=Anaconda3-5.1.0
ENV PATH /opt/conda/bin:$PATH
ENV LANG=C.UTF-8 LC_ALL=C.UTF-8
RUN apt-get update --fix-missing && apt-get install -y wget bzip2 ca-certificates \
libglib2.0-0 libxext6 libsm6 libxrender1 \
git mercurial subversion \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
RUN wget --quiet https://repo.continuum.io/archive/${ANACONDA_VERSION}-Linux-x86_64.sh -O ~/anaconda.sh \
&& /bin/bash ~/anaconda.sh -b -p /opt/conda \
&& rm ~/anaconda.sh \
&& ln -s /opt/conda/etc/profile.d/conda.sh /etc/profile.d/conda.sh \
&& echo ". /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc \
# reset default python to 3.5
&& rm /usr/bin/python \
&& ln -s /usr/bin/python3.5 /usr/bin/python
CMD ["/bin/bash"]

Просмотреть файл

@ -0,0 +1,23 @@
FROM aztk/spark:v0.1.0-spark2.2.0-base
ARG ANACONDA_VERSION=Anaconda3-5.1.0
ENV PATH /opt/conda/bin:$PATH
ENV LANG=C.UTF-8 LC_ALL=C.UTF-8
RUN apt-get update --fix-missing && apt-get install -y wget bzip2 ca-certificates \
libglib2.0-0 libxext6 libsm6 libxrender1 \
git mercurial subversion \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
RUN wget --quiet https://repo.continuum.io/archive/${ANACONDA_VERSION}-Linux-x86_64.sh -O ~/anaconda.sh \
&& /bin/bash ~/anaconda.sh -b -p /opt/conda \
&& rm ~/anaconda.sh \
&& ln -s /opt/conda/etc/profile.d/conda.sh /etc/profile.d/conda.sh \
&& echo ". /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc \
# reset default python to 3.5
&& rm /usr/bin/python \
&& ln -s /usr/bin/python3.5 /usr/bin/python
CMD ["/bin/bash"]

Просмотреть файл

@ -0,0 +1,23 @@
FROM aztk/spark:v0.1.0-spark2.2.0-gpu
ARG ANACONDA_VERSION=Anaconda3-5.1.0
ENV PATH /opt/conda/bin:$PATH
ENV LANG=C.UTF-8 LC_ALL=C.UTF-8
RUN apt-get update --fix-missing && apt-get install -y wget bzip2 ca-certificates \
libglib2.0-0 libxext6 libsm6 libxrender1 \
git mercurial subversion \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
RUN wget --quiet https://repo.continuum.io/archive/${ANACONDA_VERSION}-Linux-x86_64.sh -O ~/anaconda.sh \
&& /bin/bash ~/anaconda.sh -b -p /opt/conda \
&& rm ~/anaconda.sh \
&& ln -s /opt/conda/etc/profile.d/conda.sh /etc/profile.d/conda.sh \
&& echo ". /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc \
# reset default python to 3.5
&& rm /usr/bin/python \
&& ln -s /usr/bin/python3.5 /usr/bin/python
CMD ["/bin/bash"]

Просмотреть файл

@ -0,0 +1,23 @@
FROM aztk/spark:v0.1.0-spark2.3.0-base
ARG ANACONDA_VERSION=Anaconda3-5.1.0
ENV PATH /opt/conda/bin:$PATH
ENV LANG=C.UTF-8 LC_ALL=C.UTF-8
RUN apt-get update --fix-missing && apt-get install -y wget bzip2 ca-certificates \
libglib2.0-0 libxext6 libsm6 libxrender1 \
git mercurial subversion \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
RUN wget --quiet https://repo.continuum.io/archive/${ANACONDA_VERSION}-Linux-x86_64.sh -O ~/anaconda.sh \
&& /bin/bash ~/anaconda.sh -b -p /opt/conda \
&& rm ~/anaconda.sh \
&& ln -s /opt/conda/etc/profile.d/conda.sh /etc/profile.d/conda.sh \
&& echo ". /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc \
# reset default python to 3.5
&& rm /usr/bin/python \
&& ln -s /usr/bin/python3.5 /usr/bin/python
CMD ["/bin/bash"]

Просмотреть файл

@ -0,0 +1,23 @@
FROM aztk/spark:v0.1.0-spark2.3.0-gpu
ARG ANACONDA_VERSION=Anaconda3-5.1.0
ENV PATH /opt/conda/bin:$PATH
ENV LANG=C.UTF-8 LC_ALL=C.UTF-8
RUN apt-get update --fix-missing && apt-get install -y wget bzip2 ca-certificates \
libglib2.0-0 libxext6 libsm6 libxrender1 \
git mercurial subversion \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
RUN wget --quiet https://repo.continuum.io/archive/${ANACONDA_VERSION}-Linux-x86_64.sh -O ~/anaconda.sh \
&& /bin/bash ~/anaconda.sh -b -p /opt/conda \
&& rm ~/anaconda.sh \
&& ln -s /opt/conda/etc/profile.d/conda.sh /etc/profile.d/conda.sh \
&& echo ". /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc \
# reset default python to 3.5
&& rm /usr/bin/python \
&& ln -s /usr/bin/python3.5 /usr/bin/python
CMD ["/bin/bash"]

Просмотреть файл

@ -1,16 +1,22 @@
# Ubuntu 16.04 (Xenial)
FROM ubuntu:16.04
# set version of python required for thunderbolt application
ENV AZTK_PYTHON_VERSION=3.5.4
# set AZTK version compatibility
ENV AZTK_DOCKER_IMAGE_VERSION 0.1.0
# set version of python required for aztk
ENV AZTK_PYTHON_VERSION=3.5.2
# modify these ARGs on build time to specify your desired versions of Spark/Hadoop
ARG SPARK_VERSION_KEY=spark-1.6.3-bin-hadoop2.6
ENV SPARK_VERSION_KEY 1.6.3
ENV SPARK_FULL_VERSION spark-${SPARK_VERSION_KEY}-bin-without-hadoop
ENV HADOOP_VERSION 2.8.3
ENV LANG=C.UTF-8 LC_ALL=C.UTF-8
# set up env vars for pyenv
ENV HOME /
ENV PYENV_ROOT $HOME/.pyenv
ENV PATH $PYENV_ROOT/shims:$PYENV_ROOT/bin:$PATH
# set env vars
ENV JAVA_HOME /usr/lib/jvm/java-1.8.0-openjdk-amd64
ENV SPARK_HOME /home/spark-current
ENV PATH $SPARK_HOME/bin:$PATH
RUN apt-get clean \
&& apt-get update -y \
@ -23,39 +29,130 @@ RUN apt-get clean \
libbz2-dev \
libreadline-dev \
libsqlite3-dev \
maven \
wget \
curl \
llvm \
git \
libncurses5-dev \
libncursesw5-dev \
python3-pip \
python3-venv \
xz-utils \
tk-dev \
&& apt-get update -y \
# install [software-properties-common]
# so we can use [apt-add-repository] to add the repository [ppa:webupd8team/java]
# install [software-properties-common]
# so we can use [apt-add-repository] to add the repository [ppa:webupd8team/java]
# from which we install Java8
&& apt-get install -y --no-install-recommends software-properties-common \
&& apt-add-repository ppa:webupd8team/java -y \
&& apt-get update -y \
# install java
&& apt-get install -y --no-install-recommends default-jdk \
# download pyenv
&& git clone git://github.com/yyuu/pyenv.git .pyenv \
&& git clone https://github.com/yyuu/pyenv-virtualenv.git ~/.pyenv/plugins/pyenv-virtualenv \
# install & setup pyenv
&& eval "$(pyenv init -)" \
&& echo 'eval "$(pyenv init -)"' >> ~/.bashrc \
# install aztk required python version
&& env PYTHON_CONFIGURE_OPTS="--enable-shared" pyenv install -f $AZTK_PYTHON_VERSION \
&& pyenv global $AZTK_PYTHON_VERSION \
# install spark & setup symlink to SPARK_HOME
&& curl https://d3kbcqa49mib13.cloudfront.net/$SPARK_VERSION_KEY.tgz | tar xvz -C /home \
&& ln -s /home/$SPARK_VERSION_KEY /home/spark-current
# set env vars
ENV JAVA_HOME /usr/lib/jvm/java-1.8.0-openjdk-amd64
ENV SPARK_HOME /home/spark-current
ENV PATH $SPARK_HOME/bin:$PATH
# set up user python and aztk python
&& ln -s /usr/bin/python3.5 /usr/bin/python \
&& /usr/bin/python -m pip install --upgrade pip setuptools wheel \
&& apt-get remove -y python3-pip \
# build and install spark
&& git clone https://github.com/apache/spark.git \
&& cd spark \
&& git checkout tags/v${SPARK_VERSION_KEY} \
&& export MAVEN_OPTS="-Xmx3g -XX:ReservedCodeCacheSize=1024m" \
&& ./make-distribution.sh --name custom-spark --tgz -Phive -Phive-thriftserver -Dhadoop.version=${HADOOP_VERSION} -Phadoop-2.6 -DskipTests \
&& tar -xvzf /spark/spark-${SPARK_VERSION_KEY}-bin-custom-spark.tgz --directory=/home \
&& ln -s "/home/spark-${SPARK_VERSION_KEY}-bin-custom-spark" /home/spark-current \
&& rm -rf /spark \
# copy azure storage jars and dependencies to $SPARK_HOME/jars
&& echo "<project>" \
"<modelVersion>4.0.0</modelVersion>" \
"<groupId>groupId</groupId>" \
"<artifactId>artifactId</artifactId>" \
"<version>1.0</version>" \
"<dependencies>" \
"<dependency>" \
"<groupId>org.apache.hadoop</groupId>" \
"<artifactId>hadoop-azure-datalake</artifactId>" \
"<version>${HADOOP_VERSION}</version>" \
"<exclusions>" \
"<exclusion>" \
"<groupId>org.apache.hadoop</groupId>" \
"<artifactId>hadoop-common</artifactId>" \
"</exclusion>" \
"</exclusions> " \
"</dependency>" \
"<dependency>" \
"<groupId>org.apache.hadoop</groupId>" \
"<artifactId>hadoop-azure</artifactId>" \
"<version>${HADOOP_VERSION}</version>" \
"<exclusions>" \
"<exclusion>" \
"<groupId>org.apache.hadoop</groupId>" \
"<artifactId>hadoop-common</artifactId>" \
"</exclusion>" \
"<exclusion>" \
"<groupId>com.fasterxml.jackson.core</groupId>" \
"<artifactId>jackson-core</artifactId>" \
"</exclusion>" \
"</exclusions> " \
"</dependency>" \
"<dependency>" \
"<groupId>com.microsoft.sqlserver</groupId>" \
"<artifactId>mssql-jdbc</artifactId>" \
"<version>6.4.0.jre8</version>" \
"</dependency>" \
"<dependency>" \
"<groupId>com.microsoft.azure</groupId>" \
"<artifactId>azure-storage</artifactId>" \
"<version>2.2.0</version>" \
"<exclusions>" \
"<exclusion>" \
"<groupId>com.fasterxml.jackson.core</groupId>" \
"<artifactId>jackson-core</artifactId>" \
"</exclusion>" \
"<exclusion>" \
"<groupId>org.apache.commons</groupId>" \
"<artifactId>commons-lang3</artifactId>" \
"</exclusion>" \
"<exclusion>" \
"<groupId>org.slf4j</groupId>" \
"<artifactId>slf4j-api</artifactId>" \
"</exclusion>" \
"</exclusions>" \
"</dependency>" \
"<dependency>" \
"<groupId>com.microsoft.azure</groupId>" \
"<artifactId>azure-cosmosdb-spark_2.1.0_2.11</artifactId>" \
"<version>1.1.1</version>" \
"<exclusions>" \
"<exclusion>" \
"<groupId>org.apache.tinkerpop</groupId>" \
"<artifactId>tinkergraph-gremlin</artifactId>" \
"</exclusion>" \
"<exclusion>" \
"<groupId>org.apache.tinkerpop</groupId>" \
"<artifactId>spark-gremlin</artifactId>" \
"</exclusion>" \
"<exclusion>" \
"<groupId>io.netty</groupId>" \
"<artifactId>*</artifactId>" \
"</exclusion>" \
"<exclusion>" \
"<groupId>com.fasterxml.jackson.core</groupId>" \
"<artifactId>jackson-annotations</artifactId>" \
"</exclusion>" \
"</exclusions> " \
"</dependency>" \
"</dependencies>" \
"</project>" > /tmp/pom.xml \
&& cd /tmp \
&& mvn dependency:copy-dependencies -DoutputDirectory="${SPARK_HOME}/jars/" \
# cleanup
&& apt-get --purge autoremove -y maven python3-pip \
&& apt-get autoremove -y \
&& apt-get autoclean -y \
&& rm -rf /tmp/* \
&& rm -rf /root/.cache \
&& rm -rf /root/.m2 \
&& rm -rf /var/lib/apt/lists/*
CMD ["/bin/bash"]

Просмотреть файл

@ -1,16 +1,22 @@
# Ubuntu 16.04 (Xenial)
FROM ubuntu:16.04
# set version of python required for thunderbolt application
ENV AZTK_PYTHON_VERSION=3.5.4
# set AZTK version compatibility
ENV AZTK_DOCKER_IMAGE_VERSION 0.1.0
# set version of python required for aztk
ENV AZTK_PYTHON_VERSION=3.5.2
# modify these ARGs on build time to specify your desired versions of Spark/Hadoop
ARG SPARK_VERSION_KEY=spark-2.1.0-bin-hadoop2.7
ENV SPARK_VERSION_KEY 2.1.0
ENV SPARK_FULL_VERSION spark-${SPARK_VERSION_KEY}-bin-without-hadoop
ENV HADOOP_VERSION 2.8.3
ENV LANG=C.UTF-8 LC_ALL=C.UTF-8
# set up env vars for pyenv
ENV HOME /
ENV PYENV_ROOT $HOME/.pyenv
ENV PATH $PYENV_ROOT/shims:$PYENV_ROOT/bin:$PATH
# set env vars
ENV JAVA_HOME /usr/lib/jvm/java-1.8.0-openjdk-amd64
ENV SPARK_HOME /home/spark-current
ENV PATH $SPARK_HOME/bin:$PATH
RUN apt-get clean \
&& apt-get update -y \
@ -23,39 +29,130 @@ RUN apt-get clean \
libbz2-dev \
libreadline-dev \
libsqlite3-dev \
maven \
wget \
curl \
llvm \
git \
libncurses5-dev \
libncursesw5-dev \
python3-pip \
python3-venv \
xz-utils \
tk-dev \
&& apt-get update -y \
# install [software-properties-common]
# so we can use [apt-add-repository] to add the repository [ppa:webupd8team/java]
# install [software-properties-common]
# so we can use [apt-add-repository] to add the repository [ppa:webupd8team/java]
# from which we install Java8
&& apt-get install -y --no-install-recommends software-properties-common \
&& apt-add-repository ppa:webupd8team/java -y \
&& apt-get update -y \
# install java
&& apt-get install -y --no-install-recommends default-jdk \
# download pyenv
&& git clone git://github.com/yyuu/pyenv.git .pyenv \
&& git clone https://github.com/yyuu/pyenv-virtualenv.git ~/.pyenv/plugins/pyenv-virtualenv \
# install & setup pyenv
&& eval "$(pyenv init -)" \
&& echo 'eval "$(pyenv init -)"' >> ~/.bashrc \
# install aztk required python version
&& env PYTHON_CONFIGURE_OPTS="--enable-shared" pyenv install -f $AZTK_PYTHON_VERSION \
&& pyenv global $AZTK_PYTHON_VERSION \
# install spark & setup symlink to SPARK_HOME
&& curl https://d3kbcqa49mib13.cloudfront.net/$SPARK_VERSION_KEY.tgz | tar xvz -C /home \
&& ln -s /home/$SPARK_VERSION_KEY /home/spark-current
# set env vars
ENV JAVA_HOME /usr/lib/jvm/java-1.8.0-openjdk-amd64
ENV SPARK_HOME /home/spark-current
ENV PATH $SPARK_HOME/bin:$PATH
# set up user python and aztk python
&& ln -s /usr/bin/python3.5 /usr/bin/python \
&& /usr/bin/python -m pip install --upgrade pip setuptools wheel \
&& apt-get remove -y python3-pip \
# build and install spark
&& git clone https://github.com/apache/spark.git \
&& cd spark \
&& git checkout tags/v${SPARK_VERSION_KEY} \
&& export MAVEN_OPTS="-Xmx3g -XX:ReservedCodeCacheSize=1024m" \
&& ./dev/make-distribution.sh --name custom-spark --pip --tgz -Phive -Phive-thriftserver -Dhadoop.version=${HADOOP_VERSION} -DskipTests \
&& tar -xvzf /spark/spark-${SPARK_VERSION_KEY}-bin-custom-spark.tgz --directory=/home \
&& ln -s "/home/spark-${SPARK_VERSION_KEY}-bin-custom-spark" /home/spark-current \
&& rm -rf /spark \
# copy azure storage jars and dependencies to $SPARK_HOME/jars
&& echo "<project>" \
"<modelVersion>4.0.0</modelVersion>" \
"<groupId>groupId</groupId>" \
"<artifactId>artifactId</artifactId>" \
"<version>1.0</version>" \
"<dependencies>" \
"<dependency>" \
"<groupId>org.apache.hadoop</groupId>" \
"<artifactId>hadoop-azure-datalake</artifactId>" \
"<version>${HADOOP_VERSION}</version>" \
"<exclusions>" \
"<exclusion>" \
"<groupId>org.apache.hadoop</groupId>" \
"<artifactId>hadoop-common</artifactId>" \
"</exclusion>" \
"</exclusions> " \
"</dependency>" \
"<dependency>" \
"<groupId>org.apache.hadoop</groupId>" \
"<artifactId>hadoop-azure</artifactId>" \
"<version>${HADOOP_VERSION}</version>" \
"<exclusions>" \
"<exclusion>" \
"<groupId>org.apache.hadoop</groupId>" \
"<artifactId>hadoop-common</artifactId>" \
"</exclusion>" \
"<exclusion>" \
"<groupId>com.fasterxml.jackson.core</groupId>" \
"<artifactId>jackson-core</artifactId>" \
"</exclusion>" \
"</exclusions> " \
"</dependency>" \
"<dependency>" \
"<groupId>com.microsoft.sqlserver</groupId>" \
"<artifactId>mssql-jdbc</artifactId>" \
"<version>6.4.0.jre8</version>" \
"</dependency>" \
"<dependency>" \
"<groupId>com.microsoft.azure</groupId>" \
"<artifactId>azure-storage</artifactId>" \
"<version>2.2.0</version>" \
"<exclusions>" \
"<exclusion>" \
"<groupId>com.fasterxml.jackson.core</groupId>" \
"<artifactId>jackson-core</artifactId>" \
"</exclusion>" \
"<exclusion>" \
"<groupId>org.apache.commons</groupId>" \
"<artifactId>commons-lang3</artifactId>" \
"</exclusion>" \
"<exclusion>" \
"<groupId>org.slf4j</groupId>" \
"<artifactId>slf4j-api</artifactId>" \
"</exclusion>" \
"</exclusions>" \
"</dependency>" \
"<dependency>" \
"<groupId>com.microsoft.azure</groupId>" \
"<artifactId>azure-cosmosdb-spark_${SPARK_VERSION_KEY}_2.11</artifactId>" \
"<version>1.1.1</version>" \
"<exclusions>" \
"<exclusion>" \
"<groupId>org.apache.tinkerpop</groupId>" \
"<artifactId>tinkergraph-gremlin</artifactId>" \
"</exclusion>" \
"<exclusion>" \
"<groupId>org.apache.tinkerpop</groupId>" \
"<artifactId>spark-gremlin</artifactId>" \
"</exclusion>" \
"<exclusion>" \
"<groupId>io.netty</groupId>" \
"<artifactId>*</artifactId>" \
"</exclusion>" \
"<exclusion>" \
"<groupId>com.fasterxml.jackson.core</groupId>" \
"<artifactId>jackson-annotations</artifactId>" \
"</exclusion>" \
"</exclusions> " \
"</dependency>" \
"</dependencies>" \
"</project>" > /tmp/pom.xml \
&& cd /tmp \
&& mvn dependency:copy-dependencies -DoutputDirectory="${SPARK_HOME}/jars/" \
# cleanup
&& apt-get --purge autoremove -y maven python3-pip \
&& apt-get autoremove -y \
&& apt-get autoclean -y \
&& rm -rf /tmp/* \
&& rm -rf /root/.cache \
&& rm -rf /root/.m2 \
&& rm -rf /var/lib/apt/lists/*
CMD ["/bin/bash"]

Просмотреть файл

@ -1,16 +1,22 @@
# Ubuntu 16.04 (Xenial)
FROM ubuntu:16.04
# set version of python required for thunderbolt application
ENV AZTK_PYTHON_VERSION=3.5.4
# set AZTK version compatibility
ENV AZTK_DOCKER_IMAGE_VERSION 0.1.0
# set version of python required for aztk
ENV AZTK_PYTHON_VERSION=3.5.2
# modify these ARGs on build time to specify your desired versions of Spark/Hadoop
ARG SPARK_VERSION_KEY=spark-2.2.0-bin-hadoop2.7
ENV SPARK_VERSION_KEY 2.2.0
ENV SPARK_FULL_VERSION spark-${SPARK_VERSION_KEY}-bin-without-hadoop
ENV HADOOP_VERSION 2.8.3
ENV LANG=C.UTF-8 LC_ALL=C.UTF-8
# set up env vars for pyenv
ENV HOME /
ENV PYENV_ROOT $HOME/.pyenv
ENV PATH $PYENV_ROOT/shims:$PYENV_ROOT/bin:$PATH
# set env vars
ENV JAVA_HOME /usr/lib/jvm/java-1.8.0-openjdk-amd64
ENV SPARK_HOME /home/spark-current
ENV PATH $SPARK_HOME/bin:$PATH
RUN apt-get clean \
&& apt-get update -y \
@ -23,39 +29,129 @@ RUN apt-get clean \
libbz2-dev \
libreadline-dev \
libsqlite3-dev \
maven \
wget \
curl \
llvm \
git \
libncurses5-dev \
libncursesw5-dev \
python3-pip \
python3-venv \
xz-utils \
tk-dev \
&& apt-get update -y \
# install [software-properties-common]
# so we can use [apt-add-repository] to add the repository [ppa:webupd8team/java]
# install [software-properties-common]
# so we can use [apt-add-repository] to add the repository [ppa:webupd8team/java]
# from which we install Java8
&& apt-get install -y --no-install-recommends software-properties-common \
&& apt-add-repository ppa:webupd8team/java -y \
&& apt-get update -y \
# install java
&& apt-get install -y --no-install-recommends default-jdk \
# download pyenv
&& git clone git://github.com/yyuu/pyenv.git .pyenv \
&& git clone https://github.com/yyuu/pyenv-virtualenv.git ~/.pyenv/plugins/pyenv-virtualenv \
# install & setup pyenv
&& eval "$(pyenv init -)" \
&& echo 'eval "$(pyenv init -)"' >> ~/.bashrc \
# install aztk required python version
&& env PYTHON_CONFIGURE_OPTS="--enable-shared" pyenv install -f $AZTK_PYTHON_VERSION \
&& pyenv global $AZTK_PYTHON_VERSION \
# install spark & setup symlink to SPARK_HOME
&& curl https://d3kbcqa49mib13.cloudfront.net/$SPARK_VERSION_KEY.tgz | tar xvz -C /home \
&& ln -s /home/$SPARK_VERSION_KEY /home/spark-current
# set env vars
ENV JAVA_HOME /usr/lib/jvm/java-1.8.0-openjdk-amd64
ENV SPARK_HOME /home/spark-current
ENV PATH $SPARK_HOME/bin:$PATH
# set up user python
&& ln -s /usr/bin/python3.5 /usr/bin/python \
&& /usr/bin/python -m pip install --upgrade pip setuptools wheel \
# build and install spark
&& git clone https://github.com/apache/spark.git \
&& cd spark \
&& git checkout tags/v${SPARK_VERSION_KEY} \
&& export MAVEN_OPTS="-Xmx3g -XX:ReservedCodeCacheSize=1024m" \
&& ./dev/make-distribution.sh --name custom-spark --pip --tgz -Phive -Phive-thriftserver -Dhadoop.version=${HADOOP_VERSION} -DskipTests \
&& tar -xvzf /spark/spark-${SPARK_VERSION_KEY}-bin-custom-spark.tgz --directory=/home \
&& ln -s "/home/spark-${SPARK_VERSION_KEY}-bin-custom-spark" /home/spark-current \
&& rm -rf /spark \
# copy azure storage jars and dependencies to $SPARK_HOME/jars
&& echo "<project>" \
"<modelVersion>4.0.0</modelVersion>" \
"<groupId>groupId</groupId>" \
"<artifactId>artifactId</artifactId>" \
"<version>1.0</version>" \
"<dependencies>" \
"<dependency>" \
"<groupId>org.apache.hadoop</groupId>" \
"<artifactId>hadoop-azure-datalake</artifactId>" \
"<version>${HADOOP_VERSION}</version>" \
"<exclusions>" \
"<exclusion>" \
"<groupId>org.apache.hadoop</groupId>" \
"<artifactId>hadoop-common</artifactId>" \
"</exclusion>" \
"</exclusions> " \
"</dependency>" \
"<dependency>" \
"<groupId>org.apache.hadoop</groupId>" \
"<artifactId>hadoop-azure</artifactId>" \
"<version>${HADOOP_VERSION}</version>" \
"<exclusions>" \
"<exclusion>" \
"<groupId>org.apache.hadoop</groupId>" \
"<artifactId>hadoop-common</artifactId>" \
"</exclusion>" \
"<exclusion>" \
"<groupId>com.fasterxml.jackson.core</groupId>" \
"<artifactId>jackson-core</artifactId>" \
"</exclusion>" \
"</exclusions> " \
"</dependency>" \
"<dependency>" \
"<groupId>com.microsoft.sqlserver</groupId>" \
"<artifactId>mssql-jdbc</artifactId>" \
"<version>6.4.0.jre8</version>" \
"</dependency>" \
"<dependency>" \
"<groupId>com.microsoft.azure</groupId>" \
"<artifactId>azure-storage</artifactId>" \
"<version>2.2.0</version>" \
"<exclusions>" \
"<exclusion>" \
"<groupId>com.fasterxml.jackson.core</groupId>" \
"<artifactId>jackson-core</artifactId>" \
"</exclusion>" \
"<exclusion>" \
"<groupId>org.apache.commons</groupId>" \
"<artifactId>commons-lang3</artifactId>" \
"</exclusion>" \
"<exclusion>" \
"<groupId>org.slf4j</groupId>" \
"<artifactId>slf4j-api</artifactId>" \
"</exclusion>" \
"</exclusions>" \
"</dependency>" \
"<dependency>" \
"<groupId>com.microsoft.azure</groupId>" \
"<artifactId>azure-cosmosdb-spark_${SPARK_VERSION_KEY}_2.11</artifactId>" \
"<version>1.1.1</version>" \
"<exclusions>" \
"<exclusion>" \
"<groupId>org.apache.tinkerpop</groupId>" \
"<artifactId>tinkergraph-gremlin</artifactId>" \
"</exclusion>" \
"<exclusion>" \
"<groupId>org.apache.tinkerpop</groupId>" \
"<artifactId>spark-gremlin</artifactId>" \
"</exclusion>" \
"<exclusion>" \
"<groupId>io.netty</groupId>" \
"<artifactId>*</artifactId>" \
"</exclusion>" \
"<exclusion>" \
"<groupId>com.fasterxml.jackson.core</groupId>" \
"<artifactId>jackson-annotations</artifactId>" \
"</exclusion>" \
"</exclusions> " \
"</dependency>" \
"</dependencies>" \
"</project>" > /tmp/pom.xml \
&& cd /tmp \
&& mvn dependency:copy-dependencies -DoutputDirectory="${SPARK_HOME}/jars/" \
# cleanup
&& apt-get --purge autoremove -y maven python3-pip \
&& apt-get autoremove -y \
&& apt-get autoclean -y \
&& rm -rf /tmp/* \
&& rm -rf /root/.cache \
&& rm -rf /root/.m2 \
&& rm -rf /var/lib/apt/lists/*
CMD ["/bin/bash"]

Просмотреть файл

@ -0,0 +1,158 @@
# Ubuntu 16.04 (Xenial)
FROM ubuntu:16.04
# set AZTK version compatibility
ENV AZTK_DOCKER_IMAGE_VERSION 0.1.0
# set version of python required for aztk
ENV AZTK_PYTHON_VERSION=3.5.2
# modify these ARGs on build time to specify your desired versions of Spark/Hadoop
ENV SPARK_VERSION_KEY 2.3.0
ENV SPARK_FULL_VERSION spark-${SPARK_VERSION_KEY}-bin-without-hadoop
ENV HADOOP_VERSION 2.8.3
ENV LANG=C.UTF-8 LC_ALL=C.UTF-8
# set env vars
ENV JAVA_HOME /usr/lib/jvm/java-1.8.0-openjdk-amd64
ENV SPARK_HOME /home/spark-current
ENV PATH $SPARK_HOME/bin:$PATH
RUN apt-get clean \
&& apt-get update -y \
# install dependency packages
&& apt-get install -y --no-install-recommends \
make \
build-essential \
zlib1g-dev \
libssl-dev \
libbz2-dev \
libreadline-dev \
libsqlite3-dev \
maven \
wget \
curl \
llvm \
git \
libncurses5-dev \
libncursesw5-dev \
python3-pip \
python3-venv \
xz-utils \
tk-dev \
&& apt-get update -y \
# install [software-properties-common]
# so we can use [apt-add-repository] to add the repository [ppa:webupd8team/java]
# from which we install Java8
&& apt-get install -y --no-install-recommends software-properties-common \
&& apt-add-repository ppa:webupd8team/java -y \
&& apt-get update -y \
# install java
&& apt-get install -y --no-install-recommends default-jdk \
# set up user python and aztk python
&& ln -s /usr/bin/python3.5 /usr/bin/python \
&& /usr/bin/python -m pip install --upgrade pip setuptools wheel \
&& apt-get remove -y python3-pip \
# build and install spark
&& git clone https://github.com/apache/spark.git \
&& cd spark \
&& git checkout tags/v${SPARK_VERSION_KEY} \
&& export MAVEN_OPTS="-Xmx3g -XX:ReservedCodeCacheSize=1024m" \
&& ./dev/make-distribution.sh --name custom-spark --pip --tgz -Phive -Phive-thriftserver -Dhadoop.version=${HADOOP_VERSION} -DskipTests \
&& tar -xvzf /spark/spark-${SPARK_VERSION_KEY}-bin-custom-spark.tgz --directory=/home \
&& ln -s "/home/spark-${SPARK_VERSION_KEY}-bin-custom-spark" /home/spark-current \
&& rm -rf /spark \
# copy azure storage jars and dependencies to $SPARK_HOME/jars
&& echo "<project>" \
"<modelVersion>4.0.0</modelVersion>" \
"<groupId>groupId</groupId>" \
"<artifactId>artifactId</artifactId>" \
"<version>1.0</version>" \
"<dependencies>" \
"<dependency>" \
"<groupId>org.apache.hadoop</groupId>" \
"<artifactId>hadoop-azure-datalake</artifactId>" \
"<version>${HADOOP_VERSION}</version>" \
"<exclusions>" \
"<exclusion>" \
"<groupId>org.apache.hadoop</groupId>" \
"<artifactId>hadoop-common</artifactId>" \
"</exclusion>" \
"</exclusions> " \
"</dependency>" \
"<dependency>" \
"<groupId>org.apache.hadoop</groupId>" \
"<artifactId>hadoop-azure</artifactId>" \
"<version>${HADOOP_VERSION}</version>" \
"<exclusions>" \
"<exclusion>" \
"<groupId>org.apache.hadoop</groupId>" \
"<artifactId>hadoop-common</artifactId>" \
"</exclusion>" \
"<exclusion>" \
"<groupId>com.fasterxml.jackson.core</groupId>" \
"<artifactId>jackson-core</artifactId>" \
"</exclusion>" \
"</exclusions> " \
"</dependency>" \
"<dependency>" \
"<groupId>com.microsoft.sqlserver</groupId>" \
"<artifactId>mssql-jdbc</artifactId>" \
"<version>6.4.0.jre8</version>" \
"</dependency>" \
"<dependency>" \
"<groupId>com.microsoft.azure</groupId>" \
"<artifactId>azure-storage</artifactId>" \
"<version>2.2.0</version>" \
"<exclusions>" \
"<exclusion>" \
"<groupId>com.fasterxml.jackson.core</groupId>" \
"<artifactId>jackson-core</artifactId>" \
"</exclusion>" \
"<exclusion>" \
"<groupId>org.apache.commons</groupId>" \
"<artifactId>commons-lang3</artifactId>" \
"</exclusion>" \
"<exclusion>" \
"<groupId>org.slf4j</groupId>" \
"<artifactId>slf4j-api</artifactId>" \
"</exclusion>" \
"</exclusions>" \
"</dependency>" \
"<dependency>" \
"<groupId>com.microsoft.azure</groupId>" \
"<artifactId>azure-cosmosdb-spark_2.2.0_2.11</artifactId>" \
"<version>1.1.1</version>" \
"<exclusions>" \
"<exclusion>" \
"<groupId>org.apache.tinkerpop</groupId>" \
"<artifactId>tinkergraph-gremlin</artifactId>" \
"</exclusion>" \
"<exclusion>" \
"<groupId>org.apache.tinkerpop</groupId>" \
"<artifactId>spark-gremlin</artifactId>" \
"</exclusion>" \
"<exclusion>" \
"<groupId>io.netty</groupId>" \
"<artifactId>*</artifactId>" \
"</exclusion>" \
"<exclusion>" \
"<groupId>com.fasterxml.jackson.core</groupId>" \
"<artifactId>jackson-annotations</artifactId>" \
"</exclusion>" \
"</exclusions> " \
"</dependency>" \
"</dependencies>" \
"</project>" > /tmp/pom.xml \
&& cd /tmp \
&& mvn dependency:copy-dependencies -DoutputDirectory="${SPARK_HOME}/jars/" \
# cleanup
&& apt-get --purge autoremove -y maven python3-pip \
&& apt-get autoremove -y \
&& apt-get autoclean -y \
&& rm -rf /tmp/* \
&& rm -rf /root/.cache \
&& rm -rf /root/.m2 \
&& rm -rf /var/lib/apt/lists/*
CMD ["/bin/bash"]

143
docker-image/build.sh Normal file
Просмотреть файл

@ -0,0 +1,143 @@
#/bin/bash
# setup docker to build on /mnt instead of /var/lib/docker
echo '{
"graph": "/mnt",
"storage-driver": "overlay"
}' > /etc/docker/daemon.json
service docker restart
mkdir -p out
# base 1.6.3
docker build base/spark1.6.3/ --tag aztk/spark:v0.1.0-spark1.6.3-base > out/base-spark1.6.3.out &&
docker push aztk/spark:v0.1.0-spark1.6.3-base
# base 2.1.0
docker build base/spark2.1.0/ --tag aztk/spark:v0.1.0-spark2.1.0-base > out/base-spark2.1.0.out &&
docker push aztk/spark:v0.1.0-spark2.1.0-base
# base 2.2.0
docker build base/spark2.2.0/ --tag aztk/spark:v0.1.0-spark2.2.0-base > out/base-spark2.2.0.out &&
docker push aztk/spark:v0.1.0-spark2.2.0-base
# base 2.3.0
docker build base/spark2.3.0/ --tag aztk/spark:v0.1.0-spark2.3.0-base > out/base-spark2.3.0.out &&
docker push aztk/spark:v0.1.0-spark2.3.0-base
# miniconda-base 1.6.3
docker build miniconda/spark1.6.3/base/ --tag aztk/spark:v0.1.0-spark1.6.3-miniconda-base > out/miniconda-spark1.6.3.out &&
docker push aztk/spark:v0.1.0-spark1.6.3-miniconda-base
# miniconda-base 2.1.0
docker build miniconda/spark2.1.0/base/ --tag aztk/spark:v0.1.0-spark2.1.0-miniconda-base > out/miniconda-spark2.1.0.out &&
docker push aztk/spark:v0.1.0-spark2.1.0-miniconda-base
# miniconda-base 2.2.0
docker build miniconda/spark2.2.0/base --tag aztk/spark:v0.1.0-spark2.2.0-miniconda-base > out/miniconda-spark2.2.0.out &&
docker push aztk/spark:v0.1.0-spark2.2.0-miniconda-base
# miniconda-base 2.3.0
docker build miniconda/spark2.3.0/base/ --tag aztk/spark:v0.1.0-spark2.3.0-miniconda-base > out/miniconda-spark2.3.0.out &&
docker push aztk/spark:v0.1.0-spark2.3.0-miniconda-base
# anaconda-base 1.6.3
docker build anaconda/spark1.6.3/base/ --tag aztk/spark:v0.1.0-spark1.6.3-anaconda-base > out/anaconda-spark1.6.3.out &&
docker push aztk/spark:v0.1.0-spark1.6.3-anaconda-base
# anaconda-base 2.1.0
docker build anaconda/spark2.1.0/base/ --tag aztk/spark:v0.1.0-spark2.1.0-anaconda-base > out/anaconda-spark2.1.0.out &&
docker push aztk/spark:v0.1.0-spark2.1.0-anaconda-base
# anaconda-base 2.2.0
docker build anaconda/spark2.2.0/base/ --tag aztk/spark:v0.1.0-spark2.2.0-anaconda-base > out/anaconda-spark2.2.0.out &&
docker push aztk/spark:v0.1.0-spark2.2.0-anaconda-base
# anaconda-base 2.3.0
docker build anaconda/spark2.3.0/base/ --tag aztk/spark:v0.1.0-spark2.3.0-anaconda-base > out/anaconda-spark2.3.0.out &&
docker push aztk/spark:v0.1.0-spark2.3.0-anaconda-base
# r-base 1.6.3
docker build r/spark1.6.3/base/ --tag aztk/spark:v0.1.0-spark1.6.3-r-base > out/r-spark1.6.3.out &&
docker push aztk/spark:v0.1.0-spark1.6.3-r-base
# r-base 2.1.0
docker build r/spark2.1.0/base/ --tag aztk/spark:v0.1.0-spark2.1.0-r-base > out/r-spark2.1.0.out &&
docker push aztk/spark:v0.1.0-spark2.1.0-r-base
# r-base 2.2.0
docker build r/spark2.2.0/base/ --tag aztk/spark:v0.1.0-spark2.2.0-r-base > out/r-spark2.2.0.out &&
docker push aztk/spark:v0.1.0-spark2.2.0-r-base
# r-base 2.3.0
docker build r/spark2.3.0/base/ --tag aztk/spark:v0.1.0-spark2.3.0-r-base > out/r-spark2.3.0.out &&
docker push aztk/spark:v0.1.0-spark2.3.0-r-base
##################
# GPU #
##################
# gpu 1.6.3
docker build gpu/spark1.6.3/ --tag aztk/spark:v0.1.0-spark1.6.3-gpu > out/gpu-spark1.6.3.out &&
docker push aztk/spark:v0.1.0-spark1.6.3-gpu
# gpu 2.1.0
docker build gpu/spark2.1.0/ --tag aztk/spark:v0.1.0-spark2.1.0-gpu > out/gpu-spark2.1.0.out &&
docker push aztk/spark:v0.1.0-spark2.1.0-gpu
# gpu 2.2.0
docker build gpu/spark2.2.0/ --tag aztk/spark:v0.1.0-spark2.2.0-gpu > out/gpu-spark2.2.0.out &&
docker push aztk/spark:v0.1.0-spark2.2.0-gpu
# gpu 2.3.0
docker build gpu/spark2.3.0/ --tag aztk/spark:v0.1.0-spark2.3.0-gpu > out/gpu-spark2.3.0.out &&
docker push aztk/spark:v0.1.0-spark2.3.0-gpu
# miniconda-gpu 1.6.3
docker build miniconda/spark1.6.3/gpu/ --tag aztk/spark:v0.1.0-spark1.6.3-miniconda-gpu > out/miniconda-spark1.6.3.out &&
docker push aztk/spark:v0.1.0-spark1.6.3-miniconda-gpu
# miniconda-gpu 2.1.0
docker build miniconda/spark2.1.0/gpu/ --tag aztk/spark:v0.1.0-spark2.1.0-miniconda-gpu > out/miniconda-spark2.1.0.out &&
docker push aztk/spark:v0.1.0-spark2.1.0-miniconda-gpu
# miniconda-gpu 2.2.0
docker build miniconda/spark2.2.0/gpu --tag aztk/spark:v0.1.0-spark2.2.0-miniconda-gpu > out/miniconda-spark2.2.0.out &&
docker push aztk/spark:v0.1.0-spark2.2.0-miniconda-gpu
# miniconda-gpu 2.3.0
docker build miniconda/spark2.3.0/gpu/ --tag aztk/spark:v0.1.0-spark2.3.0-miniconda-gpu > out/miniconda-spark2.3.0.out &&
docker push aztk/spark:v0.1.0-spark2.3.0-miniconda-gpu
# anaconda-gpu 1.6.3
docker build anaconda/spark1.6.3/gpu/ --tag aztk/spark:v0.1.0-spark1.6.3-anaconda-gpu > out/anaconda-spark1.6.3.out &&
docker push aztk/spark:v0.1.0-spark1.6.3-anaconda-gpu
# anaconda-gpu 2.1.0
docker build anaconda/spark2.1.0/gpu/ --tag aztk/spark:v0.1.0-spark2.1.0-anaconda-gpu > out/anaconda-spark2.1.0.out &&
docker push aztk/spark:v0.1.0-spark2.1.0-anaconda-gpu
# anaconda-gpu 2.2.0
docker build anaconda/spark2.2.0/gpu/ --tag aztk/spark:v0.1.0-spark2.2.0-anaconda-gpu > out/anaconda-spark2.2.0.out &&
docker push aztk/spark:v0.1.0-spark2.2.0-anaconda-gpu
# anaconda-gpu 2.3.0
docker build anaconda/spark2.3.0/gpu/ --tag aztk/spark:v0.1.0-spark2.3.0-anaconda-gpu > out/anaconda-spark2.3.0.out &&
docker push aztk/spark:v0.1.0-spark2.3.0-anaconda-gpu
# r-gpu 1.6.3
docker build r/spark1.6.3/gpu/ --tag aztk/spark:v0.1.0-spark1.6.3-r-gpu > out/r-spark1.6.3.out &&
docker push aztk/spark:v0.1.0-spark1.6.3-r-gpu
# r-gpu 2.1.0
docker build r/spark2.1.0/gpu/ --tag aztk/spark:v0.1.0-spark2.1.0-r-gpu > out/r-spark2.1.0.out &&
docker push aztk/spark:v0.1.0-spark2.1.0-r-gpu
# r-gpu 2.2.0
docker build r/spark2.2.0/gpu/ --tag aztk/spark:v0.1.0-spark2.2.0-r-gpu > out/r-spark2.2.0.out &&
docker push aztk/spark:v0.1.0-spark2.2.0-r-gpu
# r-gpu 2.3.0
docker build r/spark2.3.0/gpu/ --tag aztk/spark:v0.1.0-spark2.3.0-r-gpu > out/r-spark2.3.0.out &&
docker push aztk/spark:v0.1.0-spark2.3.0-r-gpu

Просмотреть файл

@ -1,4 +1,4 @@
FROM aztk/base:spark1.6.3
FROM aztk/spark:v0.1.0-spark1.6.3-base
LABEL com.nvidia.volumes.needed="nvidia_driver"

Просмотреть файл

@ -1,4 +1,4 @@
FROM aztk/base:spark2.1.0
FROM aztk/spark:v0.1.0-spark2.1.0-base
LABEL com.nvidia.volumes.needed="nvidia_driver"
@ -76,4 +76,4 @@ ENV NUMBAPRO_CUDALIB /usr/local/cuda-8.0/targets/x86_64-linux/lib/
# RUN pip install --upgrade tensorflow-gpu
WORKDIR $SPARK_HOME
CMD ["bin/spark-class", "org.apache.spark.deploy.master.Master"]
CMD ["bin/spark-class", "org.apache.spark.deploy.master.Master"]

Просмотреть файл

@ -1,4 +1,4 @@
FROM aztk/base:spark2.2.0
FROM aztk/spark:v0.1.0-spark2.2.0-base
LABEL com.nvidia.volumes.needed="nvidia_driver"

Просмотреть файл

@ -1,4 +1,4 @@
FROM aztk/base:latest
FROM aztk/spark:v0.1.0-spark2.3.0-base
LABEL com.nvidia.volumes.needed="nvidia_driver"
@ -76,4 +76,4 @@ ENV NUMBAPRO_CUDALIB /usr/local/cuda-8.0/targets/x86_64-linux/lib/
# RUN pip install --upgrade tensorflow-gpu
WORKDIR $SPARK_HOME
CMD ["bin/spark-class", "org.apache.spark.deploy.master.Master"]
CMD ["bin/spark-class", "org.apache.spark.deploy.master.Master"]

Просмотреть файл

@ -0,0 +1,22 @@
FROM aztk/spark:v0.1.0-spark1.6.3-base
ARG MINICONDA_VERISON=Miniconda3-4.4.10
ENV LANG=C.UTF-8 LC_ALL=C.UTF-8
ENV PATH /opt/conda/bin:$PATH
RUN apt-get update --fix-missing \
&& apt-get install -y wget bzip2 ca-certificates curl git \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
RUN wget --quiet https://repo.continuum.io/miniconda/${MINICONDA_VERISON}-Linux-x86_64.sh -O ~/miniconda.sh \
&& /bin/bash ~/miniconda.sh -b -p /opt/conda \
&& rm ~/miniconda.sh \
&& /opt/conda/bin/conda clean -tipsy \
&& ln -s /opt/conda/etc/profile.d/conda.sh /etc/profile.d/conda.sh \
&& echo ". /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc
# install extras
# && conda install numba pandas scikit-learn
CMD ["/bin/bash"]

Просмотреть файл

@ -0,0 +1,22 @@
FROM aztk/spark:v0.1.0-spark1.6.3-gpu
ARG MINICONDA_VERISON=Miniconda3-4.4.10
ENV LANG=C.UTF-8 LC_ALL=C.UTF-8
ENV PATH /opt/conda/bin:$PATH
RUN apt-get update --fix-missing \
&& apt-get install -y wget bzip2 ca-certificates curl git \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
RUN wget --quiet https://repo.continuum.io/miniconda/${MINICONDA_VERISON}-Linux-x86_64.sh -O ~/miniconda.sh \
&& /bin/bash ~/miniconda.sh -b -p /opt/conda \
&& rm ~/miniconda.sh \
&& /opt/conda/bin/conda clean -tipsy \
&& ln -s /opt/conda/etc/profile.d/conda.sh /etc/profile.d/conda.sh \
&& echo ". /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc
# install extras
# && conda install numba pandas scikit-learn
CMD ["/bin/bash"]

Просмотреть файл

@ -0,0 +1,22 @@
FROM aztk/spark:v0.1.0-spark2.1.0-base
ARG MINICONDA_VERISON=Miniconda3-4.4.10
ENV LANG=C.UTF-8 LC_ALL=C.UTF-8
ENV PATH /opt/conda/bin:$PATH
RUN apt-get update --fix-missing \
&& apt-get install -y wget bzip2 ca-certificates curl git \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
RUN wget --quiet https://repo.continuum.io/miniconda/${MINICONDA_VERISON}-Linux-x86_64.sh -O ~/miniconda.sh \
&& /bin/bash ~/miniconda.sh -b -p /opt/conda \
&& rm ~/miniconda.sh \
&& /opt/conda/bin/conda clean -tipsy \
&& ln -s /opt/conda/etc/profile.d/conda.sh /etc/profile.d/conda.sh \
&& echo ". /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc
# install extras
# && conda install numba pandas scikit-learn
CMD ["/bin/bash"]

Просмотреть файл

@ -0,0 +1,22 @@
FROM aztk/spark:v0.1.0-spark2.1.0-gpu
ARG MINICONDA_VERISON=Miniconda3-4.4.10
ENV LANG=C.UTF-8 LC_ALL=C.UTF-8
ENV PATH /opt/conda/bin:$PATH
RUN apt-get update --fix-missing \
&& apt-get install -y wget bzip2 ca-certificates curl git \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
RUN wget --quiet https://repo.continuum.io/miniconda/${MINICONDA_VERISON}-Linux-x86_64.sh -O ~/miniconda.sh \
&& /bin/bash ~/miniconda.sh -b -p /opt/conda \
&& rm ~/miniconda.sh \
&& /opt/conda/bin/conda clean -tipsy \
&& ln -s /opt/conda/etc/profile.d/conda.sh /etc/profile.d/conda.sh \
&& echo ". /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc
# install extras
# && conda install numba pandas scikit-learn
CMD ["/bin/bash"]

Просмотреть файл

@ -0,0 +1,22 @@
FROM aztk/spark:v0.1.0-spark2.2.0-base
ARG MINICONDA_VERISON=Miniconda3-4.4.10
ENV LANG=C.UTF-8 LC_ALL=C.UTF-8
ENV PATH /opt/conda/bin:$PATH
RUN apt-get update --fix-missing \
&& apt-get install -y wget bzip2 ca-certificates curl git \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
RUN wget --quiet https://repo.continuum.io/miniconda/${MINICONDA_VERISON}-Linux-x86_64.sh -O ~/miniconda.sh \
&& /bin/bash ~/miniconda.sh -b -p /opt/conda \
&& rm ~/miniconda.sh \
&& /opt/conda/bin/conda clean -tipsy \
&& ln -s /opt/conda/etc/profile.d/conda.sh /etc/profile.d/conda.sh \
&& echo ". /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc
# install extras
# && conda install numba pandas scikit-learn
CMD ["/bin/bash"]

Просмотреть файл

@ -0,0 +1,22 @@
FROM aztk/spark:v0.1.0-spark2.2.0-gpu
ARG MINICONDA_VERISON=Miniconda3-4.4.10
ENV LANG=C.UTF-8 LC_ALL=C.UTF-8
ENV PATH /opt/conda/bin:$PATH
RUN apt-get update --fix-missing \
&& apt-get install -y wget bzip2 ca-certificates curl git \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
RUN wget --quiet https://repo.continuum.io/miniconda/${MINICONDA_VERISON}-Linux-x86_64.sh -O ~/miniconda.sh \
&& /bin/bash ~/miniconda.sh -b -p /opt/conda \
&& rm ~/miniconda.sh \
&& /opt/conda/bin/conda clean -tipsy \
&& ln -s /opt/conda/etc/profile.d/conda.sh /etc/profile.d/conda.sh \
&& echo ". /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc
# install extras
# && conda install numba pandas scikit-learn
CMD ["/bin/bash"]

Просмотреть файл

@ -0,0 +1,22 @@
FROM aztk/spark:v0.1.0-spark2.3.0-base
ARG MINICONDA_VERISON=Miniconda3-4.4.10
ENV LANG=C.UTF-8 LC_ALL=C.UTF-8
ENV PATH /opt/conda/bin:$PATH
RUN apt-get update --fix-missing \
&& apt-get install -y wget bzip2 ca-certificates curl git \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
RUN wget --quiet https://repo.continuum.io/miniconda/${MINICONDA_VERISON}-Linux-x86_64.sh -O ~/miniconda.sh \
&& /bin/bash ~/miniconda.sh -b -p /opt/conda \
&& rm ~/miniconda.sh \
&& /opt/conda/bin/conda clean -tipsy \
&& ln -s /opt/conda/etc/profile.d/conda.sh /etc/profile.d/conda.sh \
&& echo ". /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc
# install extras
# && conda install numba pandas scikit-learn
CMD ["/bin/bash"]

Просмотреть файл

@ -0,0 +1,22 @@
FROM aztk/spark:v0.1.0-spark2.3.0-gpu
ARG MINICONDA_VERISON=Miniconda3-4.4.10
ENV LANG=C.UTF-8 LC_ALL=C.UTF-8
ENV PATH /opt/conda/bin:$PATH
RUN apt-get update --fix-missing \
&& apt-get install -y wget bzip2 ca-certificates curl git \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
RUN wget --quiet https://repo.continuum.io/miniconda/${MINICONDA_VERISON}-Linux-x86_64.sh -O ~/miniconda.sh \
&& /bin/bash ~/miniconda.sh -b -p /opt/conda \
&& rm ~/miniconda.sh \
&& /opt/conda/bin/conda clean -tipsy \
&& ln -s /opt/conda/etc/profile.d/conda.sh /etc/profile.d/conda.sh \
&& echo ". /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc
# install extras
# && conda install numba pandas scikit-learn
CMD ["/bin/bash"]

Просмотреть файл

@ -1,23 +0,0 @@
# Python
This Dockerfile is used to build the __aztk-python__ Docker image used by this toolkit. This image uses Anaconda, providing access to a wide range of popular python packages.
You can modify these Dockerfiles to build your own image. However, in mose cases, building on top of the __aztk-base__ image is recommended.
NOTE: If you plan to use Jupyter Notebooks with your Spark cluster, we recommend using this image as Jupyter Notebook comes pre-installed with Anaconda.
## How to build this image
This Dockerfile takes in a variable at build time that allow you to specify your desired Anaconda versions: **ANACONDA_VERSION**
By default, we set **ANACONDA_VERSION=anaconda3-5.0.0**.
For example, if I wanted to use Anaconda3 v5.0.0 with Spark v2.1.0, I would select the appropriate Dockerfile and build the image as follows:
```sh
# spark2.1.0/Dockerfile
docker build \
--build-arg ANACONDA_VERSION=anaconda3-5.0.0 \
-t <my_image_tag> .
```
**ANACONDA_VERSION** is used to set the version of Anaconda for your cluster.
NOTE: Most versions of Python will work. However, when selecting your Python version, please make sure that the it is compatible with your selected version of Spark.

Просмотреть файл

@ -1,14 +0,0 @@
# Ubuntu 16.04 (Xenial)
FROM aztk/base:spark1.6.3
# modify these ARGs on build time to specify your desired versions of Spark/Hadoop
ARG ANACONDA_VERSION=anaconda3-5.0.0
# install user specificed version of anaconda
RUN pyenv install -f $ANACONDA_VERSION \
&& pyenv global $ANACONDA_VERSION
# set env vars
ENV USER_PYTHON_VERSION $ANACONDA_VERSION
CMD ["/bin/bash"]

Просмотреть файл

@ -1,14 +0,0 @@
# Ubuntu 16.04 (Xenial)
FROM aztk/gpu:spark1.6.3
# modify these ARGs on build time to specify your desired versions of Spark/Hadoop
ARG ANACONDA_VERSION=anaconda3-5.0.0
# install user specificed version of anaconda
RUN pyenv install -f $ANACONDA_VERSION \
&& pyenv global $ANACONDA_VERSION
# set env vars
ENV USER_PYTHON_VERSION $ANACONDA_VERSION
CMD ["/bin/bash"]

Просмотреть файл

@ -1,14 +0,0 @@
# Ubuntu 16.04 (Xenial)
FROM aztk/base:spark2.1.0
# modify these ARGs on build time to specify your desired versions of Spark/Hadoop
ARG ANACONDA_VERSION=anaconda3-5.0.0
# install user specificed version of anaconda
RUN pyenv install -f $ANACONDA_VERSION \
&& pyenv global $ANACONDA_VERSION
# set env vars
ENV USER_PYTHON_VERSION $ANACONDA_VERSION
CMD ["/bin/bash"]

Просмотреть файл

@ -1,14 +0,0 @@
# Ubuntu 16.04 (Xenial)
FROM aztk/gpu:spark2.1.0
# modify these ARGs on build time to specify your desired versions of Spark/Hadoop
ARG ANACONDA_VERSION=anaconda3-5.0.0
# install user specificed version of anaconda
RUN pyenv install -f $ANACONDA_VERSION \
&& pyenv global $ANACONDA_VERSION
# set env vars
ENV USER_PYTHON_VERSION $ANACONDA_VERSION
CMD ["/bin/bash"]

Просмотреть файл

@ -1,14 +0,0 @@
# Ubuntu 16.04 (Xenial)
FROM aztk/base:spark2.2.0
# modify these ARGs on build time to specify your desired versions of Spark/Hadoop
ARG ANACONDA_VERSION=anaconda3-5.0.0
# install user specificed version of anaconda
RUN pyenv install -f $ANACONDA_VERSION \
&& pyenv global $ANACONDA_VERSION
# set env vars
ENV USER_PYTHON_VERSION $ANACONDA_VERSION
CMD ["/bin/bash"]

Просмотреть файл

@ -1,14 +0,0 @@
# Ubuntu 16.04 (Xenial)
FROM aztk/gpu:spark2.2.0
# modify these ARGs on build time to specify your desired versions of Spark/Hadoop
ARG ANACONDA_VERSION=anaconda3-5.0.0
# install user specificed version of anaconda
RUN pyenv install -f $ANACONDA_VERSION \
&& pyenv global $ANACONDA_VERSION
# set env vars
ENV USER_PYTHON_VERSION $ANACONDA_VERSION
CMD ["/bin/bash"]

Просмотреть файл

@ -1,126 +1,56 @@
# Ubuntu 16.04 (Xenial)
FROM aztk/base:spark1.6.3
FROM aztk/spark:v0.1.0-spark1.6.3-base
# modify these ARGs on build time to specify your desired versions of Spark/Hadoop
ARG R_VERSION=3.4.1
ARG RSTUDIO_SERVER_VERSION=1.1.383
ARG R_VERSION=3.4.4
ARG R_BASE_VERSION=${R_VERSION}-1xenial0
ARG BUILD_DATE
# set env vars
ENV DEBIAN_FRONTEND noninteractive
ENV BUILD_DATE ${BUILD_DATE:-}
ENV RSTUDIO_SERVER_VERSION $RSTUDIO_SERVER_VERSION
ENV R_VERSION $R_VERSION
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
bash-completion \
ca-certificates \
file \
fonts-texgyre \
g++ \
gfortran \
gsfonts \
libcurl3 \
libopenblas-dev \
libpangocairo-1.0-0 \
libpng16-16 \
locales \
make \
unzip \
zip \
libcurl4-openssl-dev \
libxml2-dev \
libapparmor1 \
gdebi-core \
lsb-release \
psmisc \
sudo \
&& echo "en_US.UTF-8 UTF-8" >> /etc/locale.gen \
&& locale-gen en_US.utf8 \
&& /usr/sbin/update-locale LANG=en_US.UTF-8 \
&& BUILDDEPS="libcairo2-dev \
libpango1.0-dev \
libjpeg-dev \
libicu-dev \
libpcre3-dev \
libpng-dev \
libtiff5-dev \
liblzma-dev \
libx11-dev \
libxt-dev \
perl \
tcl8.6-dev \
tk8.6-dev \
texinfo \
texlive-extra-utils \
texlive-fonts-recommended \
texlive-fonts-extra \
texlive-latex-recommended \
x11proto-core-dev \
xauth \
xfonts-base \
xvfb" \
&& apt-get install -y --no-install-recommends $BUILDDEPS \
## Download source code
&& cd tmp/ \
&& majorVersion=$(echo $R_VERSION | cut -f1 -d.) \
&& curl -O https://cran.r-project.org/src/base/R-${majorVersion}/R-${R_VERSION}.tar.gz \
## Extract source code
&& tar -xf R-${R_VERSION}.tar.gz \
&& cd R-${R_VERSION} \
## Set compiler flags
&& R_PAPERSIZE=letter \
R_BATCHSAVE="--no-save --no-restore" \
R_BROWSER=xdg-open \
PAGER=/usr/bin/pager \
PERL=/usr/bin/perl \
R_UNZIPCMD=/usr/bin/unzip \
R_ZIPCMD=/usr/bin/zip \
R_PRINTCMD=/usr/bin/lpr \
LIBnn=lib \
AWK=/usr/bin/awk \
CFLAGS="-g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g" \
CXXFLAGS="-g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g" \
## Configure options
./configure --enable-R-shlib \
--enable-memory-profiling \
--with-readline \
--with-blas="-lopenblas" \
--disable-nls \
--without-recommended-packages \
## Build and install
&& make \
&& make install \
## Add a default CRAN mirror
&& echo "options(repos = c(CRAN = 'https://cran.rstudio.com/'), download.file.method = 'libcurl')" >> /usr/local/lib/R/etc/Rprofile.site \
&& apt-get install -y --no-install-recommends apt-transport-https \
libxml2-dev \
libcairo2-dev \
libsqlite-dev \
libmariadbd-dev \
libmariadb-client-lgpl-dev \
libpq-dev \
libssh2-1-dev \
libcurl4-openssl-dev \
locales \
&& apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9 \
&& add-apt-repository 'deb [arch=amd64,i386] https://cran.rstudio.com/bin/linux/ubuntu xenial/' \
&& apt-get update \
&& apt-get install -y --no-install-recommends r-base=${R_BASE_VERSION} r-base-dev=${R_BASE_VERSION}
RUN mkdir -p /usr/lib/R/etc/ \
&& echo "options(repos = c(CRAN = 'https://cran.rstudio.com/'), download.file.method = 'libcurl')" >> /usr/lib/R/etc/Rprofile.site \
## Add a library directory (for user-installed packages)
&& mkdir -p /usr/local/lib/R/site-library \
&& mkdir -p /usr/lib/R/site-library \
## Fix library path
&& echo "R_LIBS_USER='/usr/local/lib/R/site-library'" >> /usr/local/lib/R/etc/Renviron \
&& echo "R_LIBS=\${R_LIBS-'/usr/local/lib/R/site-library:/usr/local/lib/R/library:/usr/lib/R/library'}" >> /usr/local/lib/R/etc/Renviron \
&& echo "R_LIBS_USER='/usr/lib/R/site-library'" >> /usr/lib/R/etc/Renviron \
&& echo "R_LIBS=\${R_LIBS-'/usr/lib/R/site-library:/usr/lib/R/library:/usr/lib/R/library'}" >> /usr/lib/R/etc/Renviron \
## install packages from date-locked MRAN snapshot of CRAN
&& [ -z "$BUILD_DATE" ] && BUILD_DATE=$(TZ="America/Los_Angeles" date -I) || true \
&& MRAN=https://mran.microsoft.com/snapshot/${BUILD_DATE} \
&& echo MRAN=$MRAN >> /etc/environment \
&& export MRAN=$MRAN \
&& echo "options(repos = c(CRAN='$MRAN'), download.file.method = 'libcurl'); Sys.setenv(SPARK_HOME ='"$SPARK_HOME"')" >> /usr/local/lib/R/etc/Rprofile.site \
&& echo "options(repos = c(CRAN='$MRAN'), download.file.method = 'libcurl'); Sys.setenv(SPARK_HOME ='"$SPARK_HOME"')" >> /usr/lib/R/etc/Rprofile.site \
## Use littler installation scripts
&& Rscript -e "install.packages(c('littler', 'docopt', 'tidyverse', 'sparklyr'), repo = '$MRAN')" \
&& chown -R root:staff /usr/local/lib/R/site-library \
&& chmod -R g+wx /usr/local/lib/R/site-library \
&& ln -s /usr/local/lib/R/site-library/littler/examples/install2.r /usr/local/bin/install2.r \
&& ln -s /usr/local/lib/R/site-library/littler/examples/installGithub.r /usr/local/bin/installGithub.r \
&& ln -s /usr/local/lib/R/site-library/littler/bin/r /usr/local/bin/r \
## TEMPORARY WORKAROUND to get more robust error handling for install2.r prior to littler update
&& curl -O /usr/local/bin/install2.r https://github.com/eddelbuettel/littler/raw/master/inst/examples/install2.r \
&& chmod +x /usr/local/bin/install2.r \
&& Rscript -e "install.packages(c('dplyr', 'docopt', 'tidyverse', 'sparklyr'), repo = '$MRAN', dependencies=TRUE)" \
&& chown -R root:staff /usr/lib/R/site-library \
&& chmod -R g+wx /usr/lib/R/site-library \
&& ln -s /usr/lib/R/site-library/littler/examples/install2.r /usr/local/bin/install2.r \
&& ln -s /usr/lib/R/site-library/littler/examples/installGithub.r /usr/local/bin/installGithub.r \
&& ln -s /usr/lib/R/site-library/littler/bin/r /usr/local/bin/r \
## Clean up from R source install
&& cd / \
&& rm -rf /tmp/* \
&& apt-get autoremove -y \
&& apt-get autoclean -y \
&& rm -rf /var/lib/apt/lists/*
&& apt-get autoclean -y
CMD ["/bin/bash"]
RUN rm /usr/bin/python \
&& ln -s /usr/bin/python3.5 /usr/bin/python
RUN echo "en_US.UTF-8 UTF-8" >> /etc/locale.gen \
&& locale-gen en_US.utf8 \
&& /usr/sbin/update-locale LANG=en_US.UTF-8
CMD ["/bin/bash"]

Просмотреть файл

@ -1,142 +1,56 @@
# Ubuntu 16.04 (Xenial)
FROM aztk/gpu:spark1.6.3
FROM aztk/spark:v0.1.0-spark1.6.3-gpu
# modify these ARGs on build time to specify your desired versions of Spark/Hadoop
ARG R_VERSION=3.4.1
ARG RSTUDIO_SERVER_VERSION=1.1.383
ARG TENSORFLOW_VERSION=tensorflow-gpu
ARG CNTK_VERSION=https://cntk.ai/PythonWheel/GPU/cntk-2.3.1-cp35-cp35m-linux_x86_64.whl
ARG R_VERSION=3.4.4
ARG R_BASE_VERSION=${R_VERSION}-1xenial0
ARG BUILD_DATE
# set env vars
ENV DEBIAN_FRONTEND noninteractive
ENV BUILD_DATE ${BUILD_DATE:-}
ENV RSTUDIO_SERVER_VERSION $RSTUDIO_SERVER_VERSION
ENV R_VERSION $R_VERSION
RUN useradd -m -d /home/rstudio rstudio -G sudo,staff \
&& echo rstudio:rstudio | chpasswd \
&& chmod -R 777 /home/rstudio \
&& chmod -R 777 //.pyenv/
# Setting up rstudio user with Tensorflow and CNTK
USER rstudio
RUN echo "PATH='"$PATH"'" > /home/rstudio/.Renviron \
&& pip3 install \
$CNTK_VERSION \
$TENSORFLOW_VERSION \
keras
USER root
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
bash-completion \
ca-certificates \
file \
fonts-texgyre \
g++ \
gfortran \
gsfonts \
libcurl3 \
libopenblas-dev \
libpangocairo-1.0-0 \
libpng16-16 \
locales \
make \
unzip \
zip \
libcurl4-openssl-dev \
libxml2-dev \
libapparmor1 \
gdebi-core \
lsb-release \
psmisc \
sudo \
openmpi-bin \
&& echo "en_US.UTF-8 UTF-8" >> /etc/locale.gen \
&& locale-gen en_US.utf8 \
&& /usr/sbin/update-locale LANG=en_US.UTF-8 \
&& BUILDDEPS="libcairo2-dev \
libpango1.0-dev \
libjpeg-dev \
libicu-dev \
libpcre3-dev \
libpng-dev \
libtiff5-dev \
liblzma-dev \
libx11-dev \
libxt-dev \
perl \
tcl8.6-dev \
tk8.6-dev \
texinfo \
texlive-extra-utils \
texlive-fonts-recommended \
texlive-fonts-extra \
texlive-latex-recommended \
x11proto-core-dev \
xauth \
xfonts-base \
xvfb" \
&& apt-get install -y --no-install-recommends $BUILDDEPS \
## Download source code
&& cd /tmp/ \
&& majorVersion=$(echo $R_VERSION | cut -f1 -d.) \
&& curl -O https://cran.r-project.org/src/base/R-${majorVersion}/R-${R_VERSION}.tar.gz \
## Extract source code
&& tar -xf R-${R_VERSION}.tar.gz \
&& cd R-${R_VERSION} \
## Set compiler flags
&& R_PAPERSIZE=letter \
R_BATCHSAVE="--no-save --no-restore" \
R_BROWSER=xdg-open \
PAGER=/usr/bin/pager \
PERL=/usr/bin/perl \
R_UNZIPCMD=/usr/bin/unzip \
R_ZIPCMD=/usr/bin/zip \
R_PRINTCMD=/usr/bin/lpr \
LIBnn=lib \
AWK=/usr/bin/awk \
CFLAGS="-g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g" \
CXXFLAGS="-g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g" \
## Configure options
./configure --enable-R-shlib \
--enable-memory-profiling \
--with-readline \
--with-blas="-lopenblas" \
--disable-nls \
--without-recommended-packages \
## Build and install
&& make \
&& make install \
## Add a default CRAN mirror
&& echo "options(repos = c(CRAN = 'https://cran.rstudio.com/'), download.file.method = 'libcurl')" >> /usr/local/lib/R/etc/Rprofile.site \
&& apt-get install -y --no-install-recommends apt-transport-https \
libxml2-dev \
libcairo2-dev \
libsqlite-dev \
libmariadbd-dev \
libmariadb-client-lgpl-dev \
libpq-dev \
libssh2-1-dev \
libcurl4-openssl-dev \
locales \
&& apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9 \
&& add-apt-repository 'deb [arch=amd64,i386] https://cran.rstudio.com/bin/linux/ubuntu xenial/' \
&& apt-get update \
&& apt-get install -y --no-install-recommends r-base=${R_BASE_VERSION} r-base-dev=${R_BASE_VERSION}
RUN mkdir -p /usr/lib/R/etc/ \
&& echo "options(repos = c(CRAN = 'https://cran.rstudio.com/'), download.file.method = 'libcurl')" >> /usr/lib/R/etc/Rprofile.site \
## Add a library directory (for user-installed packages)
&& mkdir -p /usr/local/lib/R/site-library \
&& mkdir -p /usr/lib/R/site-library \
## Fix library path
&& echo "R_LIBS_USER='/usr/local/lib/R/site-library'" >> /usr/local/lib/R/etc/Renviron \
&& echo "R_LIBS=\${R_LIBS-'/usr/local/lib/R/site-library:/usr/local/lib/R/library:/usr/lib/R/library'}" >> /usr/local/lib/R/etc/Renviron \
&& echo "R_LIBS_USER='/usr/lib/R/site-library'" >> /usr/lib/R/etc/Renviron \
&& echo "R_LIBS=\${R_LIBS-'/usr/lib/R/site-library:/usr/lib/R/library:/usr/lib/R/library'}" >> /usr/lib/R/etc/Renviron \
## install packages from date-locked MRAN snapshot of CRAN
&& [ -z "$BUILD_DATE" ] && BUILD_DATE=$(TZ="America/Los_Angeles" date -I) || true \
&& MRAN=https://mran.microsoft.com/snapshot/${BUILD_DATE} \
&& echo MRAN=$MRAN >> /etc/environment \
&& export MRAN=$MRAN \
&& echo "options(repos = c(CRAN='$MRAN'), download.file.method = 'libcurl');" >> /usr/local/lib/R/etc/Rprofile.site \
&& echo "Sys.setenv(SPARK_HOME ='"$SPARK_HOME"');" >> /usr/local/lib/R/etc/Rprofile.site \
&& Rscript -e "install.packages(c('littler', 'docopt', 'tidyverse', 'sparklyr', 'keras', 'tensorflow'), repo = '$MRAN')" \
&& chown -R root:staff /usr/local/lib/R/site-library \
&& chmod -R g+wx /usr/local/lib/R/site-library \
&& ln -s /usr/local/lib/R/site-library/littler/examples/install2.r /usr/local/bin/install2.r \
&& ln -s /usr/local/lib/R/site-library/littler/examples/installGithub.r /usr/local/bin/installGithub.r \
&& ln -s /usr/local/lib/R/site-library/littler/bin/r /usr/local/bin/r \
## TEMPORARY WORKAROUND to get more robust error handling for install2.r prior to littler update
&& curl -O /usr/local/bin/install2.r https://github.com/eddelbuettel/littler/raw/master/inst/examples/install2.r \
&& chmod +x /usr/local/bin/install2.r \
&& echo "options(repos = c(CRAN='$MRAN'), download.file.method = 'libcurl'); Sys.setenv(SPARK_HOME ='"$SPARK_HOME"')" >> /usr/lib/R/etc/Rprofile.site \
## Use littler installation scripts
&& Rscript -e "install.packages(c('dplyr', 'docopt', 'tidyverse', 'sparklyr'), repo = '$MRAN', dependencies=TRUE)" \
&& chown -R root:staff /usr/lib/R/site-library \
&& chmod -R g+wx /usr/lib/R/site-library \
&& ln -s /usr/lib/R/site-library/littler/examples/install2.r /usr/local/bin/install2.r \
&& ln -s /usr/lib/R/site-library/littler/examples/installGithub.r /usr/local/bin/installGithub.r \
&& ln -s /usr/lib/R/site-library/littler/bin/r /usr/local/bin/r \
## Clean up from R source install
&& cd / \
&& rm -rf /tmp/* \
&& apt-get autoremove -y \
&& apt-get autoclean -y \
&& rm -rf /var/lib/apt/lists/*
&& apt-get autoclean -y
RUN rm /usr/bin/python \
&& ln -s /usr/bin/python3.5 /usr/bin/python
RUN echo "en_US.UTF-8 UTF-8" >> /etc/locale.gen \
&& locale-gen en_US.utf8 \
&& /usr/sbin/update-locale LANG=en_US.UTF-8
CMD ["/bin/bash"]

Просмотреть файл

@ -1,126 +1,56 @@
# Ubuntu 16.04 (Xenial)
FROM aztk/base:spark2.1.0
FROM aztk/spark:v0.1.0-spark2.1.0-base
# modify these ARGs on build time to specify your desired versions of Spark/Hadoop
ARG R_VERSION=3.4.1
ARG RSTUDIO_SERVER_VERSION=1.1.383
ARG R_VERSION=3.4.4
ARG R_BASE_VERSION=${R_VERSION}-1xenial0
ARG BUILD_DATE
# set env vars
ENV DEBIAN_FRONTEND noninteractive
ENV BUILD_DATE ${BUILD_DATE:-}
ENV RSTUDIO_SERVER_VERSION $RSTUDIO_SERVER_VERSION
ENV R_VERSION $R_VERSION
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
bash-completion \
ca-certificates \
file \
fonts-texgyre \
g++ \
gfortran \
gsfonts \
libcurl3 \
libopenblas-dev \
libpangocairo-1.0-0 \
libpng16-16 \
locales \
make \
unzip \
zip \
libcurl4-openssl-dev \
libxml2-dev \
libapparmor1 \
gdebi-core \
lsb-release \
psmisc \
sudo \
&& echo "en_US.UTF-8 UTF-8" >> /etc/locale.gen \
&& locale-gen en_US.utf8 \
&& /usr/sbin/update-locale LANG=en_US.UTF-8 \
&& BUILDDEPS="libcairo2-dev \
libpango1.0-dev \
libjpeg-dev \
libicu-dev \
libpcre3-dev \
libpng-dev \
libtiff5-dev \
liblzma-dev \
libx11-dev \
libxt-dev \
perl \
tcl8.6-dev \
tk8.6-dev \
texinfo \
texlive-extra-utils \
texlive-fonts-recommended \
texlive-fonts-extra \
texlive-latex-recommended \
x11proto-core-dev \
xauth \
xfonts-base \
xvfb" \
&& apt-get install -y --no-install-recommends $BUILDDEPS \
## Download source code
&& cd tmp/ \
&& majorVersion=$(echo $R_VERSION | cut -f1 -d.) \
&& curl -O https://cran.r-project.org/src/base/R-${majorVersion}/R-${R_VERSION}.tar.gz \
## Extract source code
&& tar -xf R-${R_VERSION}.tar.gz \
&& cd R-${R_VERSION} \
## Set compiler flags
&& R_PAPERSIZE=letter \
R_BATCHSAVE="--no-save --no-restore" \
R_BROWSER=xdg-open \
PAGER=/usr/bin/pager \
PERL=/usr/bin/perl \
R_UNZIPCMD=/usr/bin/unzip \
R_ZIPCMD=/usr/bin/zip \
R_PRINTCMD=/usr/bin/lpr \
LIBnn=lib \
AWK=/usr/bin/awk \
CFLAGS="-g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g" \
CXXFLAGS="-g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g" \
## Configure options
./configure --enable-R-shlib \
--enable-memory-profiling \
--with-readline \
--with-blas="-lopenblas" \
--disable-nls \
--without-recommended-packages \
## Build and install
&& make \
&& make install \
## Add a default CRAN mirror
&& echo "options(repos = c(CRAN = 'https://cran.rstudio.com/'), download.file.method = 'libcurl')" >> /usr/local/lib/R/etc/Rprofile.site \
&& apt-get install -y --no-install-recommends apt-transport-https \
libxml2-dev \
libcairo2-dev \
libsqlite-dev \
libmariadbd-dev \
libmariadb-client-lgpl-dev \
libpq-dev \
libssh2-1-dev \
libcurl4-openssl-dev \
locales \
&& apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9 \
&& add-apt-repository 'deb [arch=amd64,i386] https://cran.rstudio.com/bin/linux/ubuntu xenial/' \
&& apt-get update \
&& apt-get install -y --no-install-recommends r-base=${R_BASE_VERSION} r-base-dev=${R_BASE_VERSION}
RUN mkdir -p /usr/lib/R/etc/ \
&& echo "options(repos = c(CRAN = 'https://cran.rstudio.com/'), download.file.method = 'libcurl')" >> /usr/lib/R/etc/Rprofile.site \
## Add a library directory (for user-installed packages)
&& mkdir -p /usr/local/lib/R/site-library \
&& mkdir -p /usr/lib/R/site-library \
## Fix library path
&& echo "R_LIBS_USER='/usr/local/lib/R/site-library'" >> /usr/local/lib/R/etc/Renviron \
&& echo "R_LIBS=\${R_LIBS-'/usr/local/lib/R/site-library:/usr/local/lib/R/library:/usr/lib/R/library'}" >> /usr/local/lib/R/etc/Renviron \
&& echo "R_LIBS_USER='/usr/lib/R/site-library'" >> /usr/lib/R/etc/Renviron \
&& echo "R_LIBS=\${R_LIBS-'/usr/lib/R/site-library:/usr/lib/R/library:/usr/lib/R/library'}" >> /usr/lib/R/etc/Renviron \
## install packages from date-locked MRAN snapshot of CRAN
&& [ -z "$BUILD_DATE" ] && BUILD_DATE=$(TZ="America/Los_Angeles" date -I) || true \
&& MRAN=https://mran.microsoft.com/snapshot/${BUILD_DATE} \
&& echo MRAN=$MRAN >> /etc/environment \
&& export MRAN=$MRAN \
&& echo "options(repos = c(CRAN='$MRAN'), download.file.method = 'libcurl'); Sys.setenv(SPARK_HOME ='"$SPARK_HOME"')" >> /usr/local/lib/R/etc/Rprofile.site \
&& echo "options(repos = c(CRAN='$MRAN'), download.file.method = 'libcurl'); Sys.setenv(SPARK_HOME ='"$SPARK_HOME"')" >> /usr/lib/R/etc/Rprofile.site \
## Use littler installation scripts
&& Rscript -e "install.packages(c('littler', 'docopt', 'tidyverse', 'sparklyr'), repo = '$MRAN')" \
&& chown -R root:staff /usr/local/lib/R/site-library \
&& chmod -R g+wx /usr/local/lib/R/site-library \
&& ln -s /usr/local/lib/R/site-library/littler/examples/install2.r /usr/local/bin/install2.r \
&& ln -s /usr/local/lib/R/site-library/littler/examples/installGithub.r /usr/local/bin/installGithub.r \
&& ln -s /usr/local/lib/R/site-library/littler/bin/r /usr/local/bin/r \
## TEMPORARY WORKAROUND to get more robust error handling for install2.r prior to littler update
&& curl -O /usr/local/bin/install2.r https://github.com/eddelbuettel/littler/raw/master/inst/examples/install2.r \
&& chmod +x /usr/local/bin/install2.r \
&& Rscript -e "install.packages(c('dplyr', 'docopt', 'tidyverse', 'sparklyr'), repo = '$MRAN', dependencies=TRUE)" \
&& chown -R root:staff /usr/lib/R/site-library \
&& chmod -R g+wx /usr/lib/R/site-library \
&& ln -s /usr/lib/R/site-library/littler/examples/install2.r /usr/local/bin/install2.r \
&& ln -s /usr/lib/R/site-library/littler/examples/installGithub.r /usr/local/bin/installGithub.r \
&& ln -s /usr/lib/R/site-library/littler/bin/r /usr/local/bin/r \
## Clean up from R source install
&& cd / \
&& rm -rf /tmp/* \
&& apt-get autoremove -y \
&& apt-get autoclean -y \
&& rm -rf /var/lib/apt/lists/*
&& apt-get autoclean -y
CMD ["/bin/bash"]
RUN rm /usr/bin/python \
&& ln -s /usr/bin/python3.5 /usr/bin/python
RUN echo "en_US.UTF-8 UTF-8" >> /etc/locale.gen \
&& locale-gen en_US.utf8 \
&& /usr/sbin/update-locale LANG=en_US.UTF-8
CMD ["/bin/bash"]

Просмотреть файл

@ -1,142 +1,56 @@
# Ubuntu 16.04 (Xenial)
FROM aztk/gpu:spark2.1.0
FROM aztk/spark:v0.1.0-spark2.1.0-gpu
# modify these ARGs on build time to specify your desired versions of Spark/Hadoop
ARG R_VERSION=3.4.1
ARG RSTUDIO_SERVER_VERSION=1.1.383
ARG TENSORFLOW_VERSION=tensorflow-gpu
ARG CNTK_VERSION=https://cntk.ai/PythonWheel/GPU/cntk-2.3.1-cp35-cp35m-linux_x86_64.whl
ARG R_VERSION=3.4.4
ARG R_BASE_VERSION=${R_VERSION}-1xenial0
ARG BUILD_DATE
# set env vars
ENV DEBIAN_FRONTEND noninteractive
ENV BUILD_DATE ${BUILD_DATE:-}
ENV RSTUDIO_SERVER_VERSION $RSTUDIO_SERVER_VERSION
ENV R_VERSION $R_VERSION
RUN useradd -m -d /home/rstudio rstudio -G sudo,staff \
&& echo rstudio:rstudio | chpasswd \
&& chmod -R 777 /home/rstudio \
&& chmod -R 777 //.pyenv/
# Setting up rstudio user with Tensorflow and CNTK
USER rstudio
RUN echo "PATH='"$PATH"'" > /home/rstudio/.Renviron \
&& pip3 install \
$CNTK_VERSION \
$TENSORFLOW_VERSION \
keras
USER root
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
bash-completion \
ca-certificates \
file \
fonts-texgyre \
g++ \
gfortran \
gsfonts \
libcurl3 \
libopenblas-dev \
libpangocairo-1.0-0 \
libpng16-16 \
locales \
make \
unzip \
zip \
libcurl4-openssl-dev \
libxml2-dev \
libapparmor1 \
gdebi-core \
lsb-release \
psmisc \
sudo \
openmpi-bin \
&& echo "en_US.UTF-8 UTF-8" >> /etc/locale.gen \
&& locale-gen en_US.utf8 \
&& /usr/sbin/update-locale LANG=en_US.UTF-8 \
&& BUILDDEPS="libcairo2-dev \
libpango1.0-dev \
libjpeg-dev \
libicu-dev \
libpcre3-dev \
libpng-dev \
libtiff5-dev \
liblzma-dev \
libx11-dev \
libxt-dev \
perl \
tcl8.6-dev \
tk8.6-dev \
texinfo \
texlive-extra-utils \
texlive-fonts-recommended \
texlive-fonts-extra \
texlive-latex-recommended \
x11proto-core-dev \
xauth \
xfonts-base \
xvfb" \
&& apt-get install -y --no-install-recommends $BUILDDEPS \
## Download source code
&& cd /tmp/ \
&& majorVersion=$(echo $R_VERSION | cut -f1 -d.) \
&& curl -O https://cran.r-project.org/src/base/R-${majorVersion}/R-${R_VERSION}.tar.gz \
## Extract source code
&& tar -xf R-${R_VERSION}.tar.gz \
&& cd R-${R_VERSION} \
## Set compiler flags
&& R_PAPERSIZE=letter \
R_BATCHSAVE="--no-save --no-restore" \
R_BROWSER=xdg-open \
PAGER=/usr/bin/pager \
PERL=/usr/bin/perl \
R_UNZIPCMD=/usr/bin/unzip \
R_ZIPCMD=/usr/bin/zip \
R_PRINTCMD=/usr/bin/lpr \
LIBnn=lib \
AWK=/usr/bin/awk \
CFLAGS="-g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g" \
CXXFLAGS="-g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g" \
## Configure options
./configure --enable-R-shlib \
--enable-memory-profiling \
--with-readline \
--with-blas="-lopenblas" \
--disable-nls \
--without-recommended-packages \
## Build and install
&& make \
&& make install \
## Add a default CRAN mirror
&& echo "options(repos = c(CRAN = 'https://cran.rstudio.com/'), download.file.method = 'libcurl')" >> /usr/local/lib/R/etc/Rprofile.site \
&& apt-get install -y --no-install-recommends apt-transport-https \
libxml2-dev \
libcairo2-dev \
libsqlite-dev \
libmariadbd-dev \
libmariadb-client-lgpl-dev \
libpq-dev \
libssh2-1-dev \
libcurl4-openssl-dev \
locales \
&& apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9 \
&& add-apt-repository 'deb [arch=amd64,i386] https://cran.rstudio.com/bin/linux/ubuntu xenial/' \
&& apt-get update \
&& apt-get install -y --no-install-recommends r-base=${R_BASE_VERSION} r-base-dev=${R_BASE_VERSION}
RUN mkdir -p /usr/lib/R/etc/ \
&& echo "options(repos = c(CRAN = 'https://cran.rstudio.com/'), download.file.method = 'libcurl')" >> /usr/lib/R/etc/Rprofile.site \
## Add a library directory (for user-installed packages)
&& mkdir -p /usr/local/lib/R/site-library \
&& mkdir -p /usr/lib/R/site-library \
## Fix library path
&& echo "R_LIBS_USER='/usr/local/lib/R/site-library'" >> /usr/local/lib/R/etc/Renviron \
&& echo "R_LIBS=\${R_LIBS-'/usr/local/lib/R/site-library:/usr/local/lib/R/library:/usr/lib/R/library'}" >> /usr/local/lib/R/etc/Renviron \
&& echo "R_LIBS_USER='/usr/lib/R/site-library'" >> /usr/lib/R/etc/Renviron \
&& echo "R_LIBS=\${R_LIBS-'/usr/lib/R/site-library:/usr/lib/R/library:/usr/lib/R/library'}" >> /usr/lib/R/etc/Renviron \
## install packages from date-locked MRAN snapshot of CRAN
&& [ -z "$BUILD_DATE" ] && BUILD_DATE=$(TZ="America/Los_Angeles" date -I) || true \
&& MRAN=https://mran.microsoft.com/snapshot/${BUILD_DATE} \
&& echo MRAN=$MRAN >> /etc/environment \
&& export MRAN=$MRAN \
&& echo "options(repos = c(CRAN='$MRAN'), download.file.method = 'libcurl');" >> /usr/local/lib/R/etc/Rprofile.site \
&& echo "Sys.setenv(SPARK_HOME ='"$SPARK_HOME"');" >> /usr/local/lib/R/etc/Rprofile.site \
&& Rscript -e "install.packages(c('littler', 'docopt', 'tidyverse', 'sparklyr', 'keras', 'tensorflow'), repo = '$MRAN')" \
&& chown -R root:staff /usr/local/lib/R/site-library \
&& chmod -R g+wx /usr/local/lib/R/site-library \
&& ln -s /usr/local/lib/R/site-library/littler/examples/install2.r /usr/local/bin/install2.r \
&& ln -s /usr/local/lib/R/site-library/littler/examples/installGithub.r /usr/local/bin/installGithub.r \
&& ln -s /usr/local/lib/R/site-library/littler/bin/r /usr/local/bin/r \
## TEMPORARY WORKAROUND to get more robust error handling for install2.r prior to littler update
&& curl -O /usr/local/bin/install2.r https://github.com/eddelbuettel/littler/raw/master/inst/examples/install2.r \
&& chmod +x /usr/local/bin/install2.r \
&& echo "options(repos = c(CRAN='$MRAN'), download.file.method = 'libcurl'); Sys.setenv(SPARK_HOME ='"$SPARK_HOME"')" >> /usr/lib/R/etc/Rprofile.site \
## Use littler installation scripts
&& Rscript -e "install.packages(c('dplyr', 'docopt', 'tidyverse', 'sparklyr'), repo = '$MRAN', dependencies=TRUE)" \
&& chown -R root:staff /usr/lib/R/site-library \
&& chmod -R g+wx /usr/lib/R/site-library \
&& ln -s /usr/lib/R/site-library/littler/examples/install2.r /usr/local/bin/install2.r \
&& ln -s /usr/lib/R/site-library/littler/examples/installGithub.r /usr/local/bin/installGithub.r \
&& ln -s /usr/lib/R/site-library/littler/bin/r /usr/local/bin/r \
## Clean up from R source install
&& cd / \
&& rm -rf /tmp/* \
&& apt-get autoremove -y \
&& apt-get autoclean -y \
&& rm -rf /var/lib/apt/lists/*
&& apt-get autoclean -y
RUN rm /usr/bin/python \
&& ln -s /usr/bin/python3.5 /usr/bin/python
RUN echo "en_US.UTF-8 UTF-8" >> /etc/locale.gen \
&& locale-gen en_US.utf8 \
&& /usr/sbin/update-locale LANG=en_US.UTF-8
CMD ["/bin/bash"]

Просмотреть файл

@ -1,126 +1,56 @@
# Ubuntu 16.04 (Xenial)
FROM aztk/base:spark2.2.0
FROM aztk/spark:v0.1.0-spark2.2.0-base
# modify these ARGs on build time to specify your desired versions of Spark/Hadoop
ARG R_VERSION=3.4.1
ARG RSTUDIO_SERVER_VERSION=1.1.383
ARG R_VERSION=3.4.4
ARG R_BASE_VERSION=${R_VERSION}-1xenial0
ARG BUILD_DATE
# set env vars
ENV DEBIAN_FRONTEND noninteractive
ENV BUILD_DATE ${BUILD_DATE:-}
ENV RSTUDIO_SERVER_VERSION $RSTUDIO_SERVER_VERSION
ENV R_VERSION $R_VERSION
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
bash-completion \
ca-certificates \
file \
fonts-texgyre \
g++ \
gfortran \
gsfonts \
libcurl3 \
libopenblas-dev \
libpangocairo-1.0-0 \
libpng16-16 \
locales \
make \
unzip \
zip \
libcurl4-openssl-dev \
libxml2-dev \
libapparmor1 \
gdebi-core \
lsb-release \
psmisc \
sudo \
&& echo "en_US.UTF-8 UTF-8" >> /etc/locale.gen \
&& locale-gen en_US.utf8 \
&& /usr/sbin/update-locale LANG=en_US.UTF-8 \
&& BUILDDEPS="libcairo2-dev \
libpango1.0-dev \
libjpeg-dev \
libicu-dev \
libpcre3-dev \
libpng-dev \
libtiff5-dev \
liblzma-dev \
libx11-dev \
libxt-dev \
perl \
tcl8.6-dev \
tk8.6-dev \
texinfo \
texlive-extra-utils \
texlive-fonts-recommended \
texlive-fonts-extra \
texlive-latex-recommended \
x11proto-core-dev \
xauth \
xfonts-base \
xvfb" \
&& apt-get install -y --no-install-recommends $BUILDDEPS \
## Download source code
&& cd tmp/ \
&& majorVersion=$(echo $R_VERSION | cut -f1 -d.) \
&& curl -O https://cran.r-project.org/src/base/R-${majorVersion}/R-${R_VERSION}.tar.gz \
## Extract source code
&& tar -xf R-${R_VERSION}.tar.gz \
&& cd R-${R_VERSION} \
## Set compiler flags
&& R_PAPERSIZE=letter \
R_BATCHSAVE="--no-save --no-restore" \
R_BROWSER=xdg-open \
PAGER=/usr/bin/pager \
PERL=/usr/bin/perl \
R_UNZIPCMD=/usr/bin/unzip \
R_ZIPCMD=/usr/bin/zip \
R_PRINTCMD=/usr/bin/lpr \
LIBnn=lib \
AWK=/usr/bin/awk \
CFLAGS="-g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g" \
CXXFLAGS="-g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g" \
## Configure options
./configure --enable-R-shlib \
--enable-memory-profiling \
--with-readline \
--with-blas="-lopenblas" \
--disable-nls \
--without-recommended-packages \
## Build and install
&& make \
&& make install \
## Add a default CRAN mirror
&& echo "options(repos = c(CRAN = 'https://cran.rstudio.com/'), download.file.method = 'libcurl')" >> /usr/local/lib/R/etc/Rprofile.site \
&& apt-get install -y --no-install-recommends apt-transport-https \
libxml2-dev \
libcairo2-dev \
libsqlite-dev \
libmariadbd-dev \
libmariadb-client-lgpl-dev \
libpq-dev \
libssh2-1-dev \
libcurl4-openssl-dev \
locales \
&& apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9 \
&& add-apt-repository 'deb [arch=amd64,i386] https://cran.rstudio.com/bin/linux/ubuntu xenial/' \
&& apt-get update \
&& apt-get install -y --no-install-recommends r-base=${R_BASE_VERSION} r-base-dev=${R_BASE_VERSION}
RUN mkdir -p /usr/lib/R/etc/ \
&& echo "options(repos = c(CRAN = 'https://cran.rstudio.com/'), download.file.method = 'libcurl')" >> /usr/lib/R/etc/Rprofile.site \
## Add a library directory (for user-installed packages)
&& mkdir -p /usr/local/lib/R/site-library \
&& mkdir -p /usr/lib/R/site-library \
## Fix library path
&& echo "R_LIBS_USER='/usr/local/lib/R/site-library'" >> /usr/local/lib/R/etc/Renviron \
&& echo "R_LIBS=\${R_LIBS-'/usr/local/lib/R/site-library:/usr/local/lib/R/library:/usr/lib/R/library'}" >> /usr/local/lib/R/etc/Renviron \
&& echo "R_LIBS_USER='/usr/lib/R/site-library'" >> /usr/lib/R/etc/Renviron \
&& echo "R_LIBS=\${R_LIBS-'/usr/lib/R/site-library:/usr/lib/R/library:/usr/lib/R/library'}" >> /usr/lib/R/etc/Renviron \
## install packages from date-locked MRAN snapshot of CRAN
&& [ -z "$BUILD_DATE" ] && BUILD_DATE=$(TZ="America/Los_Angeles" date -I) || true \
&& MRAN=https://mran.microsoft.com/snapshot/${BUILD_DATE} \
&& echo MRAN=$MRAN >> /etc/environment \
&& export MRAN=$MRAN \
&& echo "options(repos = c(CRAN='$MRAN'), download.file.method = 'libcurl'); Sys.setenv(SPARK_HOME ='"$SPARK_HOME"')" >> /usr/local/lib/R/etc/Rprofile.site \
&& echo "options(repos = c(CRAN='$MRAN'), download.file.method = 'libcurl'); Sys.setenv(SPARK_HOME ='"$SPARK_HOME"')" >> /usr/lib/R/etc/Rprofile.site \
## Use littler installation scripts
&& Rscript -e "install.packages(c('littler', 'docopt', 'tidyverse', 'sparklyr'), repo = '$MRAN')" \
&& chown -R root:staff /usr/local/lib/R/site-library \
&& chmod -R g+wx /usr/local/lib/R/site-library \
&& ln -s /usr/local/lib/R/site-library/littler/examples/install2.r /usr/local/bin/install2.r \
&& ln -s /usr/local/lib/R/site-library/littler/examples/installGithub.r /usr/local/bin/installGithub.r \
&& ln -s /usr/local/lib/R/site-library/littler/bin/r /usr/local/bin/r \
## TEMPORARY WORKAROUND to get more robust error handling for install2.r prior to littler update
&& curl -O /usr/local/bin/install2.r https://github.com/eddelbuettel/littler/raw/master/inst/examples/install2.r \
&& chmod +x /usr/local/bin/install2.r \
&& Rscript -e "install.packages(c('dplyr', 'docopt', 'tidyverse', 'sparklyr'), repo = '$MRAN', dependencies=TRUE)" \
&& chown -R root:staff /usr/lib/R/site-library \
&& chmod -R g+wx /usr/lib/R/site-library \
&& ln -s /usr/lib/R/site-library/littler/examples/install2.r /usr/local/bin/install2.r \
&& ln -s /usr/lib/R/site-library/littler/examples/installGithub.r /usr/local/bin/installGithub.r \
&& ln -s /usr/lib/R/site-library/littler/bin/r /usr/local/bin/r \
## Clean up from R source install
&& cd / \
&& rm -rf /tmp/* \
&& apt-get autoremove -y \
&& apt-get autoclean -y \
&& rm -rf /var/lib/apt/lists/*
&& apt-get autoclean -y
CMD ["/bin/bash"]
RUN rm /usr/bin/python \
&& ln -s /usr/bin/python3.5 /usr/bin/python
RUN echo "en_US.UTF-8 UTF-8" >> /etc/locale.gen \
&& locale-gen en_US.utf8 \
&& /usr/sbin/update-locale LANG=en_US.UTF-8
CMD ["/bin/bash"]

Просмотреть файл

@ -1,142 +1,56 @@
# Ubuntu 16.04 (Xenial)
FROM aztk/gpu:spark2.2.0
FROM aztk/spark:v0.1.0-spark2.2.0-gpu
# modify these ARGs on build time to specify your desired versions of Spark/Hadoop
ARG R_VERSION=3.4.1
ARG RSTUDIO_SERVER_VERSION=1.1.383
ARG TENSORFLOW_VERSION=tensorflow-gpu
ARG CNTK_VERSION=https://cntk.ai/PythonWheel/GPU/cntk-2.3.1-cp35-cp35m-linux_x86_64.whl
ARG R_VERSION=3.4.4
ARG R_BASE_VERSION=${R_VERSION}-1xenial0
ARG BUILD_DATE
# set env vars
ENV DEBIAN_FRONTEND noninteractive
ENV BUILD_DATE ${BUILD_DATE:-}
ENV RSTUDIO_SERVER_VERSION $RSTUDIO_SERVER_VERSION
ENV R_VERSION $R_VERSION
RUN useradd -m -d /home/rstudio rstudio -G sudo,staff \
&& echo rstudio:rstudio | chpasswd \
&& chmod -R 777 /home/rstudio \
&& chmod -R 777 //.pyenv/
# Setting up rstudio user with Tensorflow and CNTK
USER rstudio
RUN echo "PATH='"$PATH"'" > /home/rstudio/.Renviron \
&& pip3 install \
$CNTK_VERSION \
$TENSORFLOW_VERSION \
keras
USER root
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
bash-completion \
ca-certificates \
file \
fonts-texgyre \
g++ \
gfortran \
gsfonts \
libcurl3 \
libopenblas-dev \
libpangocairo-1.0-0 \
libpng16-16 \
locales \
make \
unzip \
zip \
libcurl4-openssl-dev \
libxml2-dev \
libapparmor1 \
gdebi-core \
lsb-release \
psmisc \
sudo \
openmpi-bin \
&& echo "en_US.UTF-8 UTF-8" >> /etc/locale.gen \
&& locale-gen en_US.utf8 \
&& /usr/sbin/update-locale LANG=en_US.UTF-8 \
&& BUILDDEPS="libcairo2-dev \
libpango1.0-dev \
libjpeg-dev \
libicu-dev \
libpcre3-dev \
libpng-dev \
libtiff5-dev \
liblzma-dev \
libx11-dev \
libxt-dev \
perl \
tcl8.6-dev \
tk8.6-dev \
texinfo \
texlive-extra-utils \
texlive-fonts-recommended \
texlive-fonts-extra \
texlive-latex-recommended \
x11proto-core-dev \
xauth \
xfonts-base \
xvfb" \
&& apt-get install -y --no-install-recommends $BUILDDEPS \
## Download source code
&& cd /tmp/ \
&& majorVersion=$(echo $R_VERSION | cut -f1 -d.) \
&& curl -O https://cran.r-project.org/src/base/R-${majorVersion}/R-${R_VERSION}.tar.gz \
## Extract source code
&& tar -xf R-${R_VERSION}.tar.gz \
&& cd R-${R_VERSION} \
## Set compiler flags
&& R_PAPERSIZE=letter \
R_BATCHSAVE="--no-save --no-restore" \
R_BROWSER=xdg-open \
PAGER=/usr/bin/pager \
PERL=/usr/bin/perl \
R_UNZIPCMD=/usr/bin/unzip \
R_ZIPCMD=/usr/bin/zip \
R_PRINTCMD=/usr/bin/lpr \
LIBnn=lib \
AWK=/usr/bin/awk \
CFLAGS="-g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g" \
CXXFLAGS="-g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g" \
## Configure options
./configure --enable-R-shlib \
--enable-memory-profiling \
--with-readline \
--with-blas="-lopenblas" \
--disable-nls \
--without-recommended-packages \
## Build and install
&& make \
&& make install \
## Add a default CRAN mirror
&& echo "options(repos = c(CRAN = 'https://cran.rstudio.com/'), download.file.method = 'libcurl')" >> /usr/local/lib/R/etc/Rprofile.site \
&& apt-get install -y --no-install-recommends apt-transport-https \
libxml2-dev \
libcairo2-dev \
libsqlite-dev \
libmariadbd-dev \
libmariadb-client-lgpl-dev \
libpq-dev \
libssh2-1-dev \
libcurl4-openssl-dev \
locales \
&& apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9 \
&& add-apt-repository 'deb [arch=amd64,i386] https://cran.rstudio.com/bin/linux/ubuntu xenial/' \
&& apt-get update \
&& apt-get install -y --no-install-recommends r-base=${R_BASE_VERSION} r-base-dev=${R_BASE_VERSION}
RUN mkdir -p /usr/lib/R/etc/ \
&& echo "options(repos = c(CRAN = 'https://cran.rstudio.com/'), download.file.method = 'libcurl')" >> /usr/lib/R/etc/Rprofile.site \
## Add a library directory (for user-installed packages)
&& mkdir -p /usr/local/lib/R/site-library \
&& mkdir -p /usr/lib/R/site-library \
## Fix library path
&& echo "R_LIBS_USER='/usr/local/lib/R/site-library'" >> /usr/local/lib/R/etc/Renviron \
&& echo "R_LIBS=\${R_LIBS-'/usr/local/lib/R/site-library:/usr/local/lib/R/library:/usr/lib/R/library'}" >> /usr/local/lib/R/etc/Renviron \
&& echo "R_LIBS_USER='/usr/lib/R/site-library'" >> /usr/lib/R/etc/Renviron \
&& echo "R_LIBS=\${R_LIBS-'/usr/lib/R/site-library:/usr/lib/R/library:/usr/lib/R/library'}" >> /usr/lib/R/etc/Renviron \
## install packages from date-locked MRAN snapshot of CRAN
&& [ -z "$BUILD_DATE" ] && BUILD_DATE=$(TZ="America/Los_Angeles" date -I) || true \
&& MRAN=https://mran.microsoft.com/snapshot/${BUILD_DATE} \
&& echo MRAN=$MRAN >> /etc/environment \
&& export MRAN=$MRAN \
&& echo "options(repos = c(CRAN='$MRAN'), download.file.method = 'libcurl');" >> /usr/local/lib/R/etc/Rprofile.site \
&& echo "Sys.setenv(SPARK_HOME ='"$SPARK_HOME"');" >> /usr/local/lib/R/etc/Rprofile.site \
&& Rscript -e "install.packages(c('littler', 'docopt', 'tidyverse', 'sparklyr', 'keras', 'tensorflow'), repo = '$MRAN')" \
&& chown -R root:staff /usr/local/lib/R/site-library \
&& chmod -R g+wx /usr/local/lib/R/site-library \
&& ln -s /usr/local/lib/R/site-library/littler/examples/install2.r /usr/local/bin/install2.r \
&& ln -s /usr/local/lib/R/site-library/littler/examples/installGithub.r /usr/local/bin/installGithub.r \
&& ln -s /usr/local/lib/R/site-library/littler/bin/r /usr/local/bin/r \
## TEMPORARY WORKAROUND to get more robust error handling for install2.r prior to littler update
&& curl -O /usr/local/bin/install2.r https://github.com/eddelbuettel/littler/raw/master/inst/examples/install2.r \
&& chmod +x /usr/local/bin/install2.r \
&& echo "options(repos = c(CRAN='$MRAN'), download.file.method = 'libcurl'); Sys.setenv(SPARK_HOME ='"$SPARK_HOME"')" >> /usr/lib/R/etc/Rprofile.site \
## Use littler installation scripts
&& Rscript -e "install.packages(c('dplyr', 'docopt', 'tidyverse', 'sparklyr'), repo = '$MRAN', dependencies=TRUE)" \
&& chown -R root:staff /usr/lib/R/site-library \
&& chmod -R g+wx /usr/lib/R/site-library \
&& ln -s /usr/lib/R/site-library/littler/examples/install2.r /usr/local/bin/install2.r \
&& ln -s /usr/lib/R/site-library/littler/examples/installGithub.r /usr/local/bin/installGithub.r \
&& ln -s /usr/lib/R/site-library/littler/bin/r /usr/local/bin/r \
## Clean up from R source install
&& cd / \
&& rm -rf /tmp/* \
&& apt-get autoremove -y \
&& apt-get autoclean -y \
&& rm -rf /var/lib/apt/lists/*
&& apt-get autoclean -y
RUN rm /usr/bin/python \
&& ln -s /usr/bin/python3.5 /usr/bin/python
RUN echo "en_US.UTF-8 UTF-8" >> /etc/locale.gen \
&& locale-gen en_US.utf8 \
&& /usr/sbin/update-locale LANG=en_US.UTF-8
CMD ["/bin/bash"]

Просмотреть файл

@ -0,0 +1,56 @@
FROM aztk/spark:v0.1.0-spark2.3.0-base
ARG R_VERSION=3.4.4
ARG R_BASE_VERSION=${R_VERSION}-1xenial0
ARG BUILD_DATE
RUN apt-get update \
&& apt-get install -y --no-install-recommends apt-transport-https \
libxml2-dev \
libcairo2-dev \
libsqlite-dev \
libmariadbd-dev \
libmariadb-client-lgpl-dev \
libpq-dev \
libssh2-1-dev \
libcurl4-openssl-dev \
locales \
&& apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9 \
&& add-apt-repository 'deb [arch=amd64,i386] https://cran.rstudio.com/bin/linux/ubuntu xenial/' \
&& apt-get update \
&& apt-get install -y --no-install-recommends r-base=${R_BASE_VERSION} r-base-dev=${R_BASE_VERSION}
RUN mkdir -p /usr/lib/R/etc/ \
&& echo "options(repos = c(CRAN = 'https://cran.rstudio.com/'), download.file.method = 'libcurl')" >> /usr/lib/R/etc/Rprofile.site \
## Add a library directory (for user-installed packages)
&& mkdir -p /usr/lib/R/site-library \
## Fix library path
&& echo "R_LIBS_USER='/usr/lib/R/site-library'" >> /usr/lib/R/etc/Renviron \
&& echo "R_LIBS=\${R_LIBS-'/usr/lib/R/site-library:/usr/lib/R/library:/usr/lib/R/library'}" >> /usr/lib/R/etc/Renviron \
## install packages from date-locked MRAN snapshot of CRAN
&& [ -z "$BUILD_DATE" ] && BUILD_DATE=$(TZ="America/Los_Angeles" date -I) || true \
&& MRAN=https://mran.microsoft.com/snapshot/${BUILD_DATE} \
&& echo MRAN=$MRAN >> /etc/environment \
&& export MRAN=$MRAN \
&& echo "options(repos = c(CRAN='$MRAN'), download.file.method = 'libcurl'); Sys.setenv(SPARK_HOME ='"$SPARK_HOME"')" >> /usr/lib/R/etc/Rprofile.site \
## Use littler installation scripts
&& Rscript -e "install.packages(c('dplyr', 'docopt', 'tidyverse', 'sparklyr'), repo = '$MRAN', dependencies=TRUE)" \
&& chown -R root:staff /usr/lib/R/site-library \
&& chmod -R g+wx /usr/lib/R/site-library \
&& ln -s /usr/lib/R/site-library/littler/examples/install2.r /usr/local/bin/install2.r \
&& ln -s /usr/lib/R/site-library/littler/examples/installGithub.r /usr/local/bin/installGithub.r \
&& ln -s /usr/lib/R/site-library/littler/bin/r /usr/local/bin/r \
## Clean up from R source install
&& cd / \
&& rm -rf /tmp/* \
&& apt-get autoremove -y \
&& apt-get autoclean -y
RUN rm /usr/bin/python \
&& ln -s /usr/bin/python3.5 /usr/bin/python
RUN echo "en_US.UTF-8 UTF-8" >> /etc/locale.gen \
&& locale-gen en_US.utf8 \
&& /usr/sbin/update-locale LANG=en_US.UTF-8
CMD ["/bin/bash"]

Просмотреть файл

@ -0,0 +1,56 @@
FROM aztk/spark:v0.1.0-spark2.3.0-gpu
ARG R_VERSION=3.4.4
ARG R_BASE_VERSION=${R_VERSION}-1xenial0
ARG BUILD_DATE
RUN apt-get update \
&& apt-get install -y --no-install-recommends apt-transport-https \
libxml2-dev \
libcairo2-dev \
libsqlite-dev \
libmariadbd-dev \
libmariadb-client-lgpl-dev \
libpq-dev \
libssh2-1-dev \
libcurl4-openssl-dev \
locales \
&& apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9 \
&& add-apt-repository 'deb [arch=amd64,i386] https://cran.rstudio.com/bin/linux/ubuntu xenial/' \
&& apt-get update \
&& apt-get install -y --no-install-recommends r-base=${R_BASE_VERSION} r-base-dev=${R_BASE_VERSION}
RUN mkdir -p /usr/lib/R/etc/ \
&& echo "options(repos = c(CRAN = 'https://cran.rstudio.com/'), download.file.method = 'libcurl')" >> /usr/lib/R/etc/Rprofile.site \
## Add a library directory (for user-installed packages)
&& mkdir -p /usr/lib/R/site-library \
## Fix library path
&& echo "R_LIBS_USER='/usr/lib/R/site-library'" >> /usr/lib/R/etc/Renviron \
&& echo "R_LIBS=\${R_LIBS-'/usr/lib/R/site-library:/usr/lib/R/library:/usr/lib/R/library'}" >> /usr/lib/R/etc/Renviron \
## install packages from date-locked MRAN snapshot of CRAN
&& [ -z "$BUILD_DATE" ] && BUILD_DATE=$(TZ="America/Los_Angeles" date -I) || true \
&& MRAN=https://mran.microsoft.com/snapshot/${BUILD_DATE} \
&& echo MRAN=$MRAN >> /etc/environment \
&& export MRAN=$MRAN \
&& echo "options(repos = c(CRAN='$MRAN'), download.file.method = 'libcurl'); Sys.setenv(SPARK_HOME ='"$SPARK_HOME"')" >> /usr/lib/R/etc/Rprofile.site \
## Use littler installation scripts
&& Rscript -e "install.packages(c('dplyr', 'docopt', 'tidyverse', 'sparklyr'), repo = '$MRAN', dependencies=TRUE)" \
&& chown -R root:staff /usr/lib/R/site-library \
&& chmod -R g+wx /usr/lib/R/site-library \
&& ln -s /usr/lib/R/site-library/littler/examples/install2.r /usr/local/bin/install2.r \
&& ln -s /usr/lib/R/site-library/littler/examples/installGithub.r /usr/local/bin/installGithub.r \
&& ln -s /usr/lib/R/site-library/littler/bin/r /usr/local/bin/r \
## Clean up from R source install
&& cd / \
&& rm -rf /tmp/* \
&& apt-get autoremove -y \
&& apt-get autoclean -y
RUN rm /usr/bin/python \
&& ln -s /usr/bin/python3.5 /usr/bin/python
RUN echo "en_US.UTF-8 UTF-8" >> /etc/locale.gen \
&& locale-gen en_US.utf8 \
&& /usr/sbin/update-locale LANG=en_US.UTF-8
CMD ["/bin/bash"]

Просмотреть файл

@ -14,7 +14,7 @@ The script outputs all of the necessary information to use `aztk`, just copy the
## Usage
Copy and paste the following into an [Azure Cloud Shell](https://shell.azure.com):
```sh
wget -q https://raw.githubusercontent.com/Azure/aztk/master/account_setup.sh &&
wget -q https://raw.githubusercontent.com/Azure/aztk/v0.7.0/account_setup.sh &&
chmod 755 account_setup.sh &&
/bin/bash account_setup.sh
```

Просмотреть файл

@ -1,36 +1,9 @@
# Docker
Azure Distributed Data Engineering Toolkit runs Spark on Docker.
Supported Azure Distributed Data Engineering Toolkit images are hosted publicly on [Docker Hub](https://hub.docker.com/r/aztk/base/tags).
Supported Azure Distributed Data Engineering Toolkit images are hosted publicly on [Docker Hub](https://hub.docker.com/r/aztk/spark/).
## Versioning with Docker
The default image that this package uses is a the __aztk-base__ Docker image that comes with **Spark v2.2.0**.
You can use several versions of the __aztk-base__ image:
- Spark 2.2.0 - aztk/base:spark2.2.0 (default)
- Spark 2.1.0 - aztk/base:spark2.1.0
- Spark 1.6.3 - aztk/base:spark1.6.3
To enable GPUs you may use any of the following images, which are based upong the __aztk-base__ images. Each of these images are contain CUDA-8.0 and cuDNN-6.0. By default, these images are used if the VM type used has a GPU.
- Spark 2.2.0 - aztk/gpu:spark2.2.0 (default)
- Spark2.1.0 - aztk/gpu:spark2.1.0
- Spark 1.6.3 - aztk/gpu:spark1.6.3
We also provide two other image types tailored for the Python and R users: __aztk-r__ and __aztk-python__. You can choose between the following:
- Anaconda3-5.0.0 (Python 3.6.2) / Spark 2.2.0 - aztk/python:spark2.2.0-python3.6.2-base
- Anaconda3-5.0.0 (Python 3.6.2) / Spark 2.1.0 - aztk/python:spark2.1.0-python3.6.2-base
- Anaconda3-5.0.0 (Python 3.6.2) / Spark 1.6.3 - aztk/python:spark1.6.3-python3.6.2-base
- R 3.4.1 / Spark v2.2.0 - aztk/r-base:spark2.2.0-r3.4.1-base
- R 3.4.1 / Spark v2.1.0 - aztk/r-base:spark2.1.0-r3.4.1-base
- R 3.4.1 / Spark v1.6.3 - aztk/r-base:spark1.6.3-r3.4.1-base
Please note that each of these images also have GPU enabled versions. To use these versions, replace the "-base" part of the Docker image tag with "-gpu":
- Anaconda3-5.0.0 (Python 3.6.2) / Spark 2.2.0 (GPU) - aztk/python:spark2.2.0-python3.6.2-gpu
- Anaconda3-5.0.0 (Python 3.6.2) / Spark 2.1.0 (GPU) - aztk/python:spark2.1.0-python3.6.2-gpu
- Anaconda3-5.0.0 (Python 3.6.2) / Spark 1.6.3 (GPU) - aztk/python:spark1.6.3-python3.6.2-gpu
*Today, these supported images are hosted on Docker Hub under the repo ["base/gpu/python/r-base:<tag>"](https://hub.docker.com/r/aztk).*
By default, the `aztk/spark:v0.1.0-spark2.3.0-base` image will be used.
To select an image other than the default, you can set your Docker image at cluster creation time with the optional **--docker-repo** parameter:
@ -38,17 +11,13 @@ To select an image other than the default, you can set your Docker image at clus
aztk spark cluster create ... --docker-repo <name_of_docker_image_repo>
```
For example, if I wanted to use Spark v1.6.3, I could run the following cluster create command:
For example, if I wanted to use Spark v2.2.0, I could run the following cluster create command:
```sh
aztk spark cluster create ... --docker-repo aztk/base:spark1.6.3
```
## Using a custom Docker Image
What if I wanted to use my own Docker image?
You can build your own Docker image on top or beneath one of our supported base images _OR_ you can modify the [supported Dockerfile](../docker-image) and build your own image that way.
Please refer to ['../docker-image'](../docker-image) for more information on building your own image.
You can build your own Docker image on top or beneath one of our supported base images _OR_ you can modify the [supported Dockerfiles](https://github.com/Azure/aztk/tree/v0.7.0/docker-image) and build your own image that way.
Once you have your Docker image built and hosted publicly, you can then use the **--docker-repo** parameter in your **aztk spark cluster create** command to point to it.
@ -70,3 +39,57 @@ docker:
password: <mypassword>
endpoint: <https://my-custom-docker-endpoint.com>
```
### Building Your Own Docker Image
Building your own Docker Image provides more customization over your cluster's environment. For some, this may look like installing specific, and even private, libraries that their Spark jobs require. For others, it may just be setting up a version of Spark, Python or R that fits their particular needs.
The Azure Distributed Data Engineering Toolkit supports custom Docker images. To guarantee that your Spark deployment works, we recommend that you build on top of one of our supported images.
To build your own image, can either build _on top_ or _beneath_ one of our supported images _OR_ you can just modify one of the supported Dockerfiles to build your own.
### Building on top
You can build on top of our images by referencing the __aztk/spark__ image in the **FROM** keyword of your Dockerfile:
```sh
# Your custom Dockerfile
FROM aztk/spark:v0.1.0-spark2.3.0-base
...
```
### Building beneath
To build beneath one of our images, modify one of our Dockerfiles so that the **FROM** keyword pulls from your Docker image's location (as opposed to the default which is a base Ubuntu image):
```sh
# One of the Dockerfiles that AZTK supports
# Change the FROM statement to point to your hosted image repo
FROM my_username/my_repo:latest
...
```
Please note that for this method to work, your Docker image must have been built on Ubuntu.
## Custom Docker Image Rquirements
If you are building your own custom image and __not__ building on top of a supported image, the following requirements are necessary.
Please make sure that the following environment variables are set:
- AZTK_DOCKER_IMAGE_VERSION
- JAVA_HOME
- SPARK_HOME
You also need to make sure that __PATH__ is correctly configured with $SPARK_HOME
- PATH=$SPARK_HOME/bin:$PATH
By default, these are set as follows:
``` sh
ENV JAVA_HOME /usr/lib/jvm/java-1.8.0-openjdk-amd64
ENV SPARK_HOME /home/spark-current
ENV PATH $SPARK_HOME/bin:$PATH
```
If you are using your own version of Spark, make that it is symlinked by "/home/spark-current". **$SPARK_HOME**, must also point to "/home/spark-current".
## Hosting your Docker Image
By default, this toolkit assumes that your Docker images are publicly hosted on Docker Hub. However, we also support hosting your images privately.
See [here](https://github.com/Azure/aztk/blob/v0.7.0/docs/12-docker-image.md#using-a-custom-docker-image-that-is-privately-hosted) to learn more about using privately hosted Docker Images.

Просмотреть файл

@ -22,8 +22,8 @@ size: 2
# username: <username for the linux user to be created> (optional)
username: spark
# docker_repo: <name of docker image repo (for more information, see https://github.com/Azure/aztk/blob/master/docs/12-docker-image.html)>
docker_repo: aztk/base:spark2.2.0
# docker_repo: <name of docker image repo (for more information, see https://github.com/Azure/aztk/blob/v0.7.0/docs/12-docker-image.md)>
docker_repo: aztk/base:v0.1.0-spark2.3.0-base
# custom_script: <path to custom script to run on each node> (optional)