зеркало из https://github.com/microsoft/pai.git
Add docker cache doc (#5349)
This commit is contained in:
Родитель
8e1bb28533
Коммит
724961c538
|
@ -16,9 +16,10 @@ This manual is for cluster administrators to learn the installation and uninstal
|
|||
6. [How to Set Up Virtual Clusters](./how-to-set-up-virtual-clusters.md)
|
||||
7. [How to Set Up Marketplace](./how-to-set-up-marketplace.md)
|
||||
8. [How to Add and Remove Nodes](./how-to-add-and-remove-nodes.md)
|
||||
9. [How to Customize Cluster by Plugins](./how-to-customize-cluster-by-plugins.md)
|
||||
10. [How to Use Alert System](./how-to-use-alert-system.md)
|
||||
11. [Troubleshooting](./troubleshooting.md)
|
||||
12. [Recommended Practice](./recommended-practice.md)
|
||||
13. [How to Uninstall OpenPAI](./how-to-uninstall-openpai.md)
|
||||
14. [Upgrade Guide](./upgrade-guide.md)
|
||||
9. [How to Set Up Docker Image Cache](./how-to-set-up-docker-image-cache.md)
|
||||
10. [How to Customize Cluster by Plugins](./how-to-customize-cluster-by-plugins.md)
|
||||
11. [How to Use Alert System](./how-to-use-alert-system.md)
|
||||
12. [Troubleshooting](./troubleshooting.md)
|
||||
13. [Recommended Practice](./recommended-practice.md)
|
||||
14. [How to Uninstall OpenPAI](./how-to-uninstall-openpai.md)
|
||||
15. [Upgrade Guide](./upgrade-guide.md)
|
||||
|
|
|
@ -0,0 +1,123 @@
|
|||
# How to Set Up Docker Image Cache
|
||||
|
||||
[Docker Image Cache](https://docs.docker.com/registry/recipes/mirror/), implemented as docker-cache service in OpenPAI, can help admin avoid [Docker Hub rate limit](https://www.docker.com/increase-rate-limits), which makes deployment of service or user sumbitted job pending for a while. Docker Image Cache is basically set as a pull-through cache with [Azure Blob Storage](https://azure.microsoft.com/en-us/services/storage/blobs/) or linux filesystem as storage backend. Furthermore, with utility script to distribute docker-cache config, admins can easily switch to use their own docker registry or pull-through cache.
|
||||
|
||||
Docker image cache provides three different approaches:
|
||||
1. Boot a cache service with Azure Blob Storage backend;
|
||||
2. Boot a cache service with Linux file system backend;
|
||||
3. Use a custom registry with the cluster.
|
||||
|
||||
## Set Up Docker Image Cache during Installation
|
||||
|
||||
During installation, the only effort you need to perform is change `config.yaml` in `contrib/kubespray/config.yaml`. Those setting with "docker_cache" substring are related in "OpenPAI Customized Settings" section.
|
||||
|
||||
* `enable_docker_cache`: true if you want to enable docker-cache service, default is false, which makes all following params won't take effect.
|
||||
* `docker_cache_storage_backend`: storage backend type selector, "azure" is for [Azure Blob Storage](https://azure.microsoft.com/en-us/services/storage/blobs/), "filesystem" is for linux filesystem.
|
||||
* `docker_azure_account_name`: required when storage backend is "azure", should be your[Azure Blob Storage](https://azure.microsoft.com/en-us/services/storage/blobs/) account name.
|
||||
* `docker_azure_account_key`: required when storage backend is "azure", should be your [Azure Blob Storage](https://azure.microsoft.com/en-us/services/storage/blobs/) base64 encoded account key.
|
||||
* `docker_cache_azure_container_name`: required when storage backend is "azure", should be modified if you want to specify container name your docker-cache use, default is "dockerregistry".
|
||||
* `docker_cache_fs_mount_path`: required when storage backend is "filesystem", should be modified if you want to specify path your docker-cache use, default is "/var/lib/registry".
|
||||
* `docker_cache_remote_url`: pull-through cache remote URL, should be modified if you want to specify the other remote docker registry rather than Docker Hub, default is "https://registry-1.docker.io/".
|
||||
* `docker_cache_htpasswd`: htpasswd auth info with base64 encoded, should be used with SSL when docker-cache cache some private registry as an access control method.
|
||||
|
||||
### `config.yaml` example with Azure
|
||||
|
||||
``` yaml
|
||||
# ...
|
||||
|
||||
# Optional
|
||||
|
||||
#######################################################################
|
||||
# OpenPAI Customized Settings #
|
||||
#######################################################################
|
||||
# enable_hived_scheduler: true
|
||||
enable_docker_cache: true
|
||||
docker_cache_storage_backend: "azure"
|
||||
docker_cache_azure_account_name: "forexample"
|
||||
docker_cache_azure_account_key: "forexample"
|
||||
# docker_cache_azure_container_name: "dockerregistry"
|
||||
# docker_cache_fs_mount_path: "/var/lib/registry"
|
||||
# docker_cache_remote_url: "https://registry-1.docker.io"
|
||||
# docker_cache_htpasswd: ""
|
||||
# enable_marketplace: "true"
|
||||
|
||||
# ...
|
||||
|
||||
```
|
||||
|
||||
Make sure the setting of `enable_docker_cache` was `true`, and finish the [installation](./installation-guide.md), the docker-cache will be set up.
|
||||
|
||||
### `config.yaml` example with file system
|
||||
|
||||
``` yaml
|
||||
# ...
|
||||
|
||||
# Optional
|
||||
|
||||
#######################################################################
|
||||
# OpenPAI Customized Settings #
|
||||
#######################################################################
|
||||
# enable_hived_scheduler: true
|
||||
enable_docker_cache: true
|
||||
docker_cache_storage_backend: "filesystem"
|
||||
# docker_cache_azure_account_name: ""
|
||||
# docker_cache_azure_account_key: ""
|
||||
# docker_cache_azure_container_name: "dockerregistry"
|
||||
docker_cache_fs_mount_path: "/var/lib/registry"
|
||||
# docker_cache_remote_url: "https://registry-1.docker.io"
|
||||
# docker_cache_htpasswd: ""
|
||||
# enable_marketplace: "true"
|
||||
|
||||
# ...
|
||||
|
||||
```
|
||||
|
||||
Make sure the setting of `enable_docker_cache` was `true`, and finish the [installation](./installation-guide.md), the docker-cache will be set up.
|
||||
|
||||
### htpasswd explained
|
||||
|
||||
The *htpasswd* authentication backend allows you to configure basic authentication using an [Apache htpasswd file](https://httpd.apache.org/docs/2.4/programs/htpasswd.html).
|
||||
The only supported password format is *bcrypt*. Entries with other hash types are ignored. The htpasswd file is loaded once, at startup. If the file is invalid, the registry will display an error and will not start.
|
||||
|
||||
In docker-cache service, we use htpasswd info as k8s secret, which means `docker_cache_htpasswd` need base64 encoded htpasswd file content.
|
||||
|
||||
## Set Up Docker Image Cache for Deployed Cluster
|
||||
|
||||
For those who already deployed the cluster, there is no need to re-install the cluster totally to enable docker-cache service. The suggested way is to modify `config.yaml`, and use the following commands to upgrade.
|
||||
|
||||
```bash
|
||||
echo "pai" > cluster-id # "pai" is default cluster-id, need to change if you changed in deployment
|
||||
|
||||
# assume the workdir is pai
|
||||
echo "Generating services configurations..."
|
||||
python3 ./contrib/kubespray/script/openpai_generator.py -l ./contrib/kubespray/config/layout.yaml -c ./contrib/kubespray/config/config.yaml -o /cluster-configuration
|
||||
|
||||
echo "Pushing cluster config to k8s..."
|
||||
./paictl.py config push -p /cluster-configuration -m service < cluster-id
|
||||
|
||||
echo "Start docker-cache service..."
|
||||
./paictl.py service start -n docker-cache
|
||||
|
||||
echo "Performing docker-cache config distribution..."
|
||||
ansible-playbook -i ${HOME}/pai-deploy/cluster-cfg/hosts.yml docker-cache-config-distribute.yml || exit $?
|
||||
```
|
||||
|
||||
### Use Customized Registry Configuration
|
||||
|
||||
For those who want to deploy a registry separated with OpenPAI cluster, a simple way is to modify `./contrib/kubespray/docker-cache-config-distribute.yml`, which is a playbook to modify the docker daemon config in each node. The playbook uses `30500` port of `kube-master` node by default. To use customized registry, only thing need to be changed is to replace `{{ hostvars[groups['kube-master'][0]]['ip'] }}:30500` with custom registry `<ip>:<port>` string.
|
||||
|
||||
```yaml
|
||||
- hosts: all
|
||||
become: true
|
||||
become_user: root
|
||||
gather_facts: true
|
||||
roles:
|
||||
- role: '../roles/docker-cache/install'
|
||||
vars:
|
||||
enable_docker_cache: true
|
||||
docker_cache_host: "{{ hostvars[groups['kube-master'][0]]['ip'] }}:30500"
|
||||
tasks:
|
||||
- name: Restart service docker config from /etc/docker/daemon.json after update
|
||||
ansible.builtin.systemd:
|
||||
name: docker
|
||||
```
|
|
@ -224,10 +224,20 @@ docker_image_tag: v1.5.0
|
|||
|
||||
# Optional
|
||||
|
||||
|
||||
#######################################################################
|
||||
# OpenPAI Customized Settings #
|
||||
#######################################################################
|
||||
# enable_hived_scheduler: true
|
||||
# enable_docker_cache: true
|
||||
# docker_cache_storage_backend: "azure" # or "filesystem"
|
||||
# docker_cache_azure_account_name: ""
|
||||
# docker_cache_azure_account_key: ""
|
||||
# docker_cache_azure_container_name: "dockerregistry"
|
||||
# docker_cache_fs_mount_path: "/var/lib/registry"
|
||||
# docker_cache_remote_url: "https://registry-1.docker.io"
|
||||
# docker_cache_htpasswd: ""
|
||||
# enable_marketplace: "true"
|
||||
|
||||
#############################################
|
||||
# Ansible-playbooks' inventory hosts' vars. #
|
||||
|
|
|
@ -13,9 +13,10 @@ OpenPAI是一个提供完整人工智能模型训练和资源管理能力的开
|
|||
5. [如何设置数据存储](./how-to-set-up-storage.md)
|
||||
6. [如何设置虚拟集群](./how-to-set-up-virtual-clusters.md)
|
||||
7. [如何添加和移除结点](./how-to-add-and-remove-nodes.md)
|
||||
8. [如何使用插件定制集群](./how-to-customize-cluster-by-plugins.md)
|
||||
9. [如何使用报警系统](./how-to-use-alert-system.md)
|
||||
10. [故障排查](./troubleshooting.md)
|
||||
11. [推荐实践](./recommended-practice.md)
|
||||
12. [如何卸载OpenPAI](./how-to-uninstall-openpai.md)
|
||||
13. [升级指南](./upgrade-guide.md)
|
||||
8. [如何设置 Docker 镜像缓存](./how-to-set-up-docker-image-cache.md)
|
||||
9. [如何使用插件定制集群](./how-to-customize-cluster-by-plugins.md)
|
||||
10. [如何使用报警系统](./how-to-use-alert-system.md)
|
||||
11. [故障排查](./troubleshooting.md)
|
||||
12. [推荐实践](./recommended-practice.md)
|
||||
13. [如何卸载OpenPAI](./how-to-uninstall-openpai.md)
|
||||
14. [升级指南](./upgrade-guide.md)
|
||||
|
|
|
@ -0,0 +1,123 @@
|
|||
# 如何设置 Docker 镜像缓存
|
||||
|
||||
[Docker 镜像缓存](https://docs.docker.com/registry/recipes/mirror/), 在 OpenPAI 中实现为 `docker-cache` 服务, 可以帮助 admin 避免 [Docker Hub rate limit](https://www.docker.com/increase-rate-limits)。Docker Hub rate limit 会造成部署服务或用户提交任务在超过限制时等待。Docker 镜像缓存被配置为一个以 [Azure Blob Storage](https://azure.microsoft.com/en-us/services/storage/blobs/) 或 Linux 文件系统为存储后端的 pull-through 缓存。此外, 通过提供的 docker-cache 配置分发脚本, admin 可以方便地使用自己地 docker registry 或者 pull-through cache。
|
||||
|
||||
Docker 镜像缓存提供了三种使用方式:
|
||||
1. 启动一个使用 Azure Blob Storage 作为存储后端的缓存服务;
|
||||
2. 启动一个使用 Linux 文件系统作为存储后端的缓存服务;
|
||||
3. 使用自定义的 registry。
|
||||
|
||||
## 安装时配置 Docker 镜像缓存
|
||||
|
||||
During installation, the only effort you need to perform is change `config.yaml` in `contrib/kubespray/config.yaml`. Those setting with "docker_cache" substring are related in "OpenPAI Customized Settings" section.
|
||||
在安装时,启用 Docker 镜像缓存只需要修改 `contrib/kubespray/config.yaml` 中的 `config.yaml`。"OpenPAI Customized Settings"段中,有"docker_cache"字段的是相关配置。
|
||||
|
||||
* `enable_docker_cache`: 如果希望使用 docker-cache 服务需要设置为 true,默认为 false 并让后续的所有其它配置失效。
|
||||
* `docker_cache_storage_backend`: 存储后端类型选择参数, "azure" 使用 [Azure Blob Storage](https://azure.microsoft.com/en-us/services/storage/blobs/), "filesystem" 使用 Linux 文件系统.
|
||||
* `docker_azure_account_name`: 在存储后端类型为 "azure" 时必须填写,内容为你的 azure blob storage account name.
|
||||
* `docker_azure_account_key`: 在存储后端类型为 "azure" 时必须填写,内容为 azure blob storage base64 encoded account key.
|
||||
* `docker_cache_azure_container_name`: 在存储后端类型为 "azure" 时必须填写,在修改为特定的 container 名称时才需要修改,默认 container 名称为 "dockerregistry".
|
||||
* `docker_cache_fs_mount_path`: 在存储后端类型为 "filesystem" 时必须填写, 在修改为特定的路径时才需要修改,默认为 "/var/lib/registry".
|
||||
* `docker_cache_remote_url`: pull-through cache 所缓存的远程 registry 链接, 在修改为非 Docker Hub 的远程 registry 时才需要修改,默认为 "https://registry-1.docker.io/".
|
||||
* `docker_cache_htpasswd`: base64 编码的 htpasswd 授权信息作为访问控制方法,如果使用 htpasswd 作为授权方式最好提供 ssl 保护。
|
||||
|
||||
### 使用 Azure Blob Storage 的 `config.yaml` 示例
|
||||
|
||||
``` yaml
|
||||
# ...
|
||||
|
||||
# Optional
|
||||
|
||||
#######################################################################
|
||||
# OpenPAI Customized Settings #
|
||||
#######################################################################
|
||||
# enable_hived_scheduler: true
|
||||
enable_docker_cache: true
|
||||
docker_cache_storage_backend: "azure"
|
||||
docker_cache_azure_account_name: "forexample"
|
||||
docker_cache_azure_account_key: "forexample"
|
||||
# docker_cache_azure_container_name: "dockerregistry"
|
||||
# docker_cache_fs_mount_path: "/var/lib/registry"
|
||||
# docker_cache_remote_url: "https://registry-1.docker.io"
|
||||
# docker_cache_htpasswd: ""
|
||||
# enable_marketplace: "true"
|
||||
|
||||
# ...
|
||||
|
||||
```
|
||||
|
||||
确保 `enable_docker_cache` 配置为 `"true"`,并完成[安装](./installation-guide.md),docker-cache 服务应该就可以正常启动了。
|
||||
|
||||
### 使用 Linux 文件系统的 `config.yaml` 示例
|
||||
|
||||
``` yaml
|
||||
# ...
|
||||
|
||||
# Optional
|
||||
|
||||
#######################################################################
|
||||
# OpenPAI Customized Settings #
|
||||
#######################################################################
|
||||
# enable_hived_scheduler: true
|
||||
enable_docker_cache: true
|
||||
docker_cache_storage_backend: "filesystem"
|
||||
# docker_cache_azure_account_name: ""
|
||||
# docker_cache_azure_account_key: ""
|
||||
# docker_cache_azure_container_name: "dockerregistry"
|
||||
docker_cache_fs_mount_path: "/var/lib/registry"
|
||||
# docker_cache_remote_url: "https://registry-1.docker.io"
|
||||
# docker_cache_htpasswd: ""
|
||||
# enable_marketplace: "true"
|
||||
|
||||
# ...
|
||||
|
||||
```
|
||||
|
||||
确保 `enable_docker_cache` 配置为 `"true"`,并完成[安装](./installation-guide.md),docker-cache 服务应该就可以正常启动了。
|
||||
|
||||
### htpasswd 解释
|
||||
|
||||
*htpasswd* 授权后端允许使用 [Apache htpasswd file](https://httpd.apache.org/docs/2.4/programs/htpasswd.html) 作为 basic auth 的配置。*htpasswd* 支持的 password 格式是 *bcrypt*。其它 hash 类别的表项会被虎烈。htpasswd 文件在启动时加载,如果 registry 显示错误,则不会启动。
|
||||
|
||||
在 docker-cache 服务中,我们使用将 htpasswd 信息作为 k8s secret 引入,因此需要对 htpasswd 文件内容做 base64 编码。.
|
||||
|
||||
## 为已部署的集群配置 Docker 镜像缓存
|
||||
|
||||
对于已经部署的集群,启用 docker-cache 服务并不需要重新安装集群。更推荐的方式是修改`config.yaml`,并通过如下命令升级。
|
||||
|
||||
```bash
|
||||
echo "pai" > cluster-id # "pai" is default cluster-id, need to change if you changed in deployment
|
||||
|
||||
# assume the workdir is pai
|
||||
echo "Generating services configurations..."
|
||||
python3 ./contrib/kubespray/script/openpai_generator.py -l ./contrib/kubespray/config/layout.yaml -c ./contrib/kubespray/config/config.yaml -o /cluster-configuration
|
||||
|
||||
echo "Pushing cluster config to k8s..."
|
||||
./paictl.py config push -p /cluster-configuration -m service < cluster-id
|
||||
|
||||
echo "Start docker-cache service..."
|
||||
./paictl.py service start -n docker-cache
|
||||
|
||||
echo "Performing docker-cache config distribution..."
|
||||
ansible-playbook -i ${HOME}/pai-deploy/cluster-cfg/hosts.yml docker-cache-config-distribute.yml || exit $?
|
||||
```
|
||||
|
||||
### 使用自定义 registry 的配置
|
||||
|
||||
对于希望 OpenPAI 集群使用自定义的 registry 的用户,一个简单的方式时修改`./contrib/kubespray/docker-cache-config-distribute.yml`,该 playbook 负责修改集群内每个节点的 docker daemon 配置。在默认设置下,该 playbook 会添加 kube-master 节点的 30500 端口作为 docker-cache service 的入口。想使用自定义的 registry,仅需要修改该文件中的 `{{ hostvars[groups['kube-master'][0]]['ip'] }}:30500` 为相应的 `<ip>:<port>` 字符串即可。
|
||||
|
||||
```yaml
|
||||
- hosts: all
|
||||
become: true
|
||||
become_user: root
|
||||
gather_facts: true
|
||||
roles:
|
||||
- role: '../roles/docker-cache/install'
|
||||
vars:
|
||||
enable_docker_cache: true
|
||||
docker_cache_host: "{{ hostvars[groups['kube-master'][0]]['ip'] }}:30500"
|
||||
tasks:
|
||||
- name: Restart service docker config from /etc/docker/daemon.json after update
|
||||
ansible.builtin.systemd:
|
||||
name: docker
|
||||
```
|
|
@ -21,6 +21,7 @@ nav:
|
|||
- How to Set Up Storage: manual/cluster-admin/how-to-set-up-storage.md
|
||||
- How to Set Up Virtual Clusters: manual/cluster-admin/how-to-set-up-virtual-clusters.md
|
||||
- How to Add and Remove Nodes: manual/cluster-admin/how-to-add-and-remove-nodes.md
|
||||
- How to Set Up Docker Image Cache: manual/cluster-admin/how-to-set-up-docker-image-cache.md
|
||||
- How to Customize Cluster by Plugins: manual/cluster-admin/how-to-customize-cluster-by-plugins.md
|
||||
- How to Use Alert System: manual/cluster-admin/how-to-use-alert-system.md
|
||||
- Troubleshooting: manual/cluster-admin/troubleshooting.md
|
||||
|
|
|
@ -21,6 +21,7 @@ nav:
|
|||
- 如何设置数据存储: manual/cluster-admin/how-to-set-up-storage.md
|
||||
- 如何设置虚拟集群: manual/cluster-admin/how-to-set-up-virtual-clusters.md
|
||||
- 如何添加和移除结点: manual/cluster-admin/how-to-add-and-remove-nodes.md
|
||||
- 如何设置 Docker 镜像缓存: manual/cluster-admin/how-to-set-up-docker-image-cache.md
|
||||
- 如何使用插件定制集群: manual/cluster-admin/how-to-customize-cluster-by-plugins.md
|
||||
- 如何使用报警系统: manual/cluster-admin/how-to-use-alert-system.md
|
||||
- 故障排查: manual/cluster-admin/troubleshooting.md
|
||||
|
|
Загрузка…
Ссылка в новой задаче