pai/examples
Ziming Miao 1e363c09ae
fix keras TensorFlow backend example (#1130)
2018-08-31 15:05:38 +08:00
..
XGBoost refactor examples 2018-08-17 15:38:01 +08:00
caffe refactor examples 2018-08-17 15:38:01 +08:00
caffe2 refactor examples 2018-08-17 15:38:01 +08:00
chainer refactor examples 2018-08-17 15:38:01 +08:00
cntk refactor the examples 2018-08-17 15:38:01 +08:00
images remove duplicate content 2018-08-17 15:38:01 +08:00
jupyter refactor the examples 2018-08-17 15:38:01 +08:00
kafka refactor examples 2018-08-17 15:38:01 +08:00
keras fix keras TensorFlow backend example (#1130) 2018-08-31 15:05:38 +08:00
mpi fix link for mpi 2018-08-17 15:38:01 +08:00
mxnet refactor examples 2018-08-17 15:38:01 +08:00
pytorch refactor examples 2018-08-17 15:38:01 +08:00
scikit-learn fix link broken issue 2018-08-17 15:38:01 +08:00
serving update tensorflow serving example 2018-08-20 17:03:51 +08:00
spark remove empty config entries 2018-08-11 11:30:48 +08:00
tensorflow refactor examples 2018-08-17 15:38:01 +08:00
README.md refator job tutorial and first README.md 2018-08-17 15:38:01 +08:00

README.md

OpenPAI Job Examples

Table of Contents

Quick start: how to write and submit a CIFAR-10 job

(1) Prepare a job json file

In this section, we will use CIFAR-10 training job as an example to explain how to write and submit a job in OpenPAI.

CIFAR-10 is an established computer-vision dataset used for image classification.

  • Full example for tensorflow cifar10 image classification training on OpenPAI:
{
  // Name for the job, need to be unique
  "jobName": "tensorflow-cifar10",
  // URL pointing to the Docker image for all tasks in the job
  "image": "openpai/pai.example.tensorflow",
  // Data directory existing on HDFS
  "dataDir": "/tmp/data",
  // Output directory on HDFS, 
  "outputDir": "/tmp/output",
  // List of taskRole, one task role at least
  "taskRoles": [
    {
      // Name for the task role
      "name": "cifar_train",
      // Number of tasks for the task role, no less than 1
      "taskNumber": 1,
      // CPU number for one task in the task role, no less than 1
      "cpuNumber": 8,
      // Memory for one task in the task role, no less than 100
      "memoryMB": 32768,
      // GPU number for one task in the task role, no less than 0
      "gpuNumber": 1,
      // Executable command for tasks in the task role, can not be empty
      "command": "git clone https://github.com/tensorflow/models && cd models/research/slim && python download_and_convert_data.py --dataset_name=cifar10 --dataset_dir=$PAI_DATA_DIR && python train_image_classifier.py --batch_size=64 --model_name=inception_v3 --dataset_name=cifar10 --dataset_split_name=train --dataset_dir=$PAI_DATA_DIR --train_dir=$PAI_OUTPUT_DIR"
    }
  ]
}

(2) Submit job json file from OpenPAI webportal

Users can refer to this tutorial submit a job in web portal for job submission from OpenPAI webportal.

List of off-the-shelf examples

Examples which can be run by submitting the json straightly without any modification.

List of customized job template

These user could customize and run these jobs over OpenPAI.

Contributing

If you want to contribute a job example that can be run on PAI, please open a new pull request.

  • Prepare a folder under pai/examples folder, for example create pai/examples/caffe2/

  • Prepare example files:

    Under Caffe2 example dir, user should prepare these files for an example's contribution PR:

PAI_caffe2_dir

  1. README.md: Example's introductions
  2. Dockerfile: Example's dependencies
  3. Pai job json file: Example's OpenPAI job json template
  4. [Optional] Code file: Example's code file