Runtime for deep learning workload
Перейти к файлу
siaimes 59ec6d179d
Bugfix add new line for ssh pubkey. (#55)
Bugfix add new line for ssh pubkey.
2022-05-24 15:57:43 +08:00
.github/workflows update CI (#43) 2021-10-25 11:00:26 +08:00
build Update FC image to v1.0.0 (#46) 2022-01-24 13:15:47 +08:00
deploy Update kube-runtime dependencies (#4143) 2020-01-16 17:22:18 +08:00
go Bump gopkg.in/yaml.v2 from 2.2.2 to 2.2.8 in /go (#52) 2022-04-15 09:52:39 +08:00
src Bugfix add new line for ssh pubkey. (#55) 2022-05-24 15:57:43 +08:00
test get user ssh keys from user extension secret instead of rest server (#36) 2021-02-23 09:33:09 +08:00
.gitignore Fix docker image check issue (#6) 2020-03-23 10:13:18 +08:00
.pylintrc [Runtime] Add image checker in init container (#4156) 2020-01-21 17:27:54 +08:00
CODE_OF_CONDUCT.md Initial CODE_OF_CONDUCT.md commit 2019-12-29 20:13:10 -08:00
LICENSE Initial LICENSE commit 2019-12-29 20:13:11 -08:00
README.md Change to use go mod (#8) 2020-04-07 16:45:52 +08:00
SECURITY.md Initial SECURITY.md commit 2019-12-29 20:13:14 -08:00
package_cache.md Add offline apt package cache in kube runtime (#4226) 2020-02-25 14:33:15 +08:00
requirements.txt jinja2==2.11.3 conflict with newest markupsafe, which remove soft_unicode module. (#50) 2022-04-15 09:43:24 +08:00
requirements_dev.txt Add requirements.txt (#28) 2020-10-21 15:30:31 +08:00

README.md

Microsoft OpenPAI Runtime

Docker Pulls GitHub Workflow Status (branch)

Runtime component for deep learning workload

In order to better support deep learning workload, OpenPAI implements "PAI Runtime", a module that provides runtime support to job containers.

One major feature of PAI runtime is the instantiation of runtime environment variables. PAI runtime provides several built-in runtime environment variables, including the container role name and index, the IP, port of all the containers used in the job. With PAI runtime environment variables and Framework Controller, user can onboard custom workload (e.g., MPI, TensorBoard) without the involvement of (or modification to) OpenPAI platform itself. OpenPAI further allows users to define custom runtime environment variables, tailored for their workload.

Another major feature of OpenPAI runtime is the introduction of "PAI runtime plugin". The runtime plugin provides a way for users to customize their runtime behavior for a job container. Essentially, plugin is a generic method for user to inject some code during container initialization or container termination. OpenPAI implements several built-in plugins for desirable features, including a storage plugin that mounts to a remote storage service from within the job containers, an ssh plugin that supports ssh access to each container, and a failure analysis plugin that analyzes the failure reason when a container fails. We envision there will be more features implemented by the plugin mechanism.

Features

  1. Prepare OpenPAI runtime environment variables
  2. Failure analysis: report possible job failure reason based on the failure pattern
  3. Storage plugin: used to auto mount remote storage according to storage config
  4. SSH plugin: used to support ssh access to job container
  5. Cmd plugin: used to run customized commands before/after job

How to build

Please run docker build -f ./build/openpai-runtime.dockerfile . to build openpai-runtime docker image

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.