batch-shipyard/recipes/TensorFlow-CPU/README.md

70 строки
2.8 KiB
Markdown

# TensorFlow-CPU
This recipe shows how to run [TensorFlow](https://www.tensorflow.org/)
on a single node using a CPU only.
## Configuration
Please see refer to this [set of sample configuration files](./config) for
this recipe.
### Pool Configuration
The pool configuration should enable the following properties:
* `max_tasks_per_node` must be set to 1 or omitted
Other pool properties such as `publisher`, `offer`, `sku`, `vm_size` and
`vm_count` should be set to your desired values.
### Global Configuration
The global configuration should set the following properties:
* `docker_images` array must have a reference to a valid TensorFlow Docker
image that can execute on CPUs. The official Google TensorFlow image
[gcr.io/tensorflow/tensorflow](https://www.tensorflow.org/install/install_linux#InstallingDocker)
can work with this recipe.
### Jobs Configuration
The jobs configuration should set the following properties within the `tasks`
array to run the
[MNIST convolutional example](https://github.com/tensorflow/models/tree/master/tutorials/image/mnist).
This array should have a task definition containing:
* `docker_image` should be the name of the Docker image for this container invocation,
e.g., `gcr.io/tensorflow/tensorflow`
* `resource_files` array should be populated if you want Azure Batch to handle
the download of the training file from the web endpoint:
* `file_path` is the local file path which should be set to
`convolutional.py`
* `blob_source` is the remote URL of the file to retrieve:
`https://raw.githubusercontent.com/tensorflow/models/master/tutorials/image/mnist/convolutional.py`
* `command` should contain the command to pass to the Docker run invocation.
To run the MNIST convolutional example, the `command` would be:
`python -u convolutional.py`
### Tensorboard
If you would like to tunnel Tensorboard to your local machine, use the
`jobs-tb.yaml` file instead. This requires that a pool SSH user was added,
and `ssh` or `ssh.exe` is available. This configuration will output summary
data to the directory specified in the `--log_dir` parameter. After the job
is submitted, you can start the remote Tensorboard instance with the command:
```shell
shipyard misc tensorboard
```
Which will output some text similar to the following:
```
>> Please connect to Tensorboard at http://localhost:6006/
>> Note that Tensorboard may take a while to start if the Docker image is
>> not present. Please keep retrying the URL every few seconds.
>> Terminate your session with CTRL+C
>> If you cannot terminate your session cleanly, run:
shipyard pool ssh --nodeid tvm-1518333292_4-20170428t151941z sudo docker kill 9e7879b8
```
With a web browser, navigate to http://localhost:6006/ where Tensorboard
will be displayed.
Note that the task does not have to be completed for Tensorboard to be run,
it can be running while Tensorboard is running.