pai/docs/job_docker_env.md

1.4 KiB

Use docker to package the job environment dependencies

The system launches a deep learning job in one or more Docker containers. A Docker images is required in advance. The system provides a base Docker images with HDFS, CUDA and cuDNN support, based on which users can build their own custom Docker images.

To build a base Docker image, for example Dockerfile.build.base, run:

docker build -f Dockerfiles/Dockerfile.build.base -t pai.build.base:hadoop2.7.2-cuda8.0-cudnn6-devel-ubuntu16.04 Dockerfiles/

Then a custom docker image can be built based on it by adding FROM pai.build.base:hadoop2.7.2-cuda8.0-cudnn6-devel-ubuntu16.04 in the Dockerfile.

As an example, we customize a TensorFlow Docker image using Dockerfile.run.tensorflow:

docker build -f Dockerfiles/Dockerfile.run.tensorflow -t pai.run.tensorflow Dockerfiles/

Next, the built image is pushed to a docker registry for every node in the system to access that image:

docker tag pai.run.tensorflow your_docker_registry/pai.run.tensorflow
docker push your_docker_registry/pai.run.tensorflow

And the image is ready to serve. Note that above script assume the docker registry is deployed locally. Actual script can vary depending on the configuration of Docker registry.