batch-shipyard/recipes/TensorFlow-CPU/README.md

2.8 KiB

TensorFlow-CPU

This recipe shows how to run TensorFlow on a single node using a CPU only.

Configuration

Please see refer to this set of sample configuration files for this recipe.

Pool Configuration

The pool configuration should enable the following properties:

  • max_tasks_per_node must be set to 1 or omitted

Other pool properties such as publisher, offer, sku, vm_size and vm_count should be set to your desired values.

Global Configuration

The global configuration should set the following properties:

  • docker_images array must have a reference to a valid TensorFlow Docker image that can execute on CPUs. The official Google TensorFlow image gcr.io/tensorflow/tensorflow can work with this recipe.

Jobs Configuration

The jobs configuration should set the following properties within the tasks array to run the MNIST convolutional example. This array should have a task definition containing:

  • docker_image should be the name of the Docker image for this container invocation, e.g., gcr.io/tensorflow/tensorflow
  • resource_files array should be populated if you want Azure Batch to handle the download of the training file from the web endpoint:
    • file_path is the local file path which should be set to convolutional.py
    • blob_source is the remote URL of the file to retrieve: https://raw.githubusercontent.com/tensorflow/models/master/tutorials/image/mnist/convolutional.py
  • command should contain the command to pass to the Docker run invocation. To run the MNIST convolutional example, the command would be: python -u convolutional.py

Tensorboard

If you would like to tunnel Tensorboard to your local machine, use the jobs-tb.yaml file instead. This requires that a pool SSH user was added, and ssh or ssh.exe is available. This configuration will output summary data to the directory specified in the --log_dir parameter. After the job is submitted, you can start the remote Tensorboard instance with the command:

shipyard misc tensorboard

Which will output some text similar to the following:

>> Please connect to Tensorboard at http://localhost:6006/

>> Note that Tensorboard may take a while to start if the Docker image is
>> not present. Please keep retrying the URL every few seconds.

>> Terminate your session with CTRL+C

>> If you cannot terminate your session cleanly, run:
     shipyard pool ssh --nodeid tvm-1518333292_4-20170428t151941z sudo docker kill 9e7879b8

With a web browser, navigate to http://localhost:6006/ where Tensorboard will be displayed.

Note that the task does not have to be completed for Tensorboard to be run, it can be running while Tensorboard is running.