This commit is contained in:
Xiang Long 2021-03-09 13:19:59 +08:00 коммит произвёл GitHub
Родитель 8e1bb28533
Коммит 724961c538
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 4AEE18F83AFDEB23
7 изменённых файлов: 272 добавлений и 12 удалений

Просмотреть файл

@ -16,9 +16,10 @@ This manual is for cluster administrators to learn the installation and uninstal
6. [How to Set Up Virtual Clusters](./how-to-set-up-virtual-clusters.md)
7. [How to Set Up Marketplace](./how-to-set-up-marketplace.md)
8. [How to Add and Remove Nodes](./how-to-add-and-remove-nodes.md)
9. [How to Customize Cluster by Plugins](./how-to-customize-cluster-by-plugins.md)
10. [How to Use Alert System](./how-to-use-alert-system.md)
11. [Troubleshooting](./troubleshooting.md)
12. [Recommended Practice](./recommended-practice.md)
13. [How to Uninstall OpenPAI](./how-to-uninstall-openpai.md)
14. [Upgrade Guide](./upgrade-guide.md)
9. [How to Set Up Docker Image Cache](./how-to-set-up-docker-image-cache.md)
10. [How to Customize Cluster by Plugins](./how-to-customize-cluster-by-plugins.md)
11. [How to Use Alert System](./how-to-use-alert-system.md)
12. [Troubleshooting](./troubleshooting.md)
13. [Recommended Practice](./recommended-practice.md)
14. [How to Uninstall OpenPAI](./how-to-uninstall-openpai.md)
15. [Upgrade Guide](./upgrade-guide.md)

Просмотреть файл

@ -0,0 +1,123 @@
# How to Set Up Docker Image Cache
[Docker Image Cache](https://docs.docker.com/registry/recipes/mirror/), implemented as docker-cache service in OpenPAI, can help admin avoid [Docker Hub rate limit](https://www.docker.com/increase-rate-limits), which makes deployment of service or user sumbitted job pending for a while. Docker Image Cache is basically set as a pull-through cache with [Azure Blob Storage](https://azure.microsoft.com/en-us/services/storage/blobs/) or linux filesystem as storage backend. Furthermore, with utility script to distribute docker-cache config, admins can easily switch to use their own docker registry or pull-through cache.
Docker image cache provides three different approaches:
1. Boot a cache service with Azure Blob Storage backend;
2. Boot a cache service with Linux file system backend;
3. Use a custom registry with the cluster.
## Set Up Docker Image Cache during Installation
During installation, the only effort you need to perform is change `config.yaml` in `contrib/kubespray/config.yaml`. Those setting with "docker_cache" substring are related in "OpenPAI Customized Settings" section.
* `enable_docker_cache`: true if you want to enable docker-cache service, default is false, which makes all following params won't take effect.
* `docker_cache_storage_backend`: storage backend type selector, "azure" is for [Azure Blob Storage](https://azure.microsoft.com/en-us/services/storage/blobs/), "filesystem" is for linux filesystem.
* `docker_azure_account_name`: required when storage backend is "azure", should be your[Azure Blob Storage](https://azure.microsoft.com/en-us/services/storage/blobs/) account name.
* `docker_azure_account_key`: required when storage backend is "azure", should be your [Azure Blob Storage](https://azure.microsoft.com/en-us/services/storage/blobs/) base64 encoded account key.
* `docker_cache_azure_container_name`: required when storage backend is "azure", should be modified if you want to specify container name your docker-cache use, default is "dockerregistry".
* `docker_cache_fs_mount_path`: required when storage backend is "filesystem", should be modified if you want to specify path your docker-cache use, default is "/var/lib/registry".
* `docker_cache_remote_url`: pull-through cache remote URL, should be modified if you want to specify the other remote docker registry rather than Docker Hub, default is "https://registry-1.docker.io/".
* `docker_cache_htpasswd`: htpasswd auth info with base64 encoded, should be used with SSL when docker-cache cache some private registry as an access control method.
### `config.yaml` example with Azure
``` yaml
# ...
# Optional
#######################################################################
# OpenPAI Customized Settings #
#######################################################################
# enable_hived_scheduler: true
enable_docker_cache: true
docker_cache_storage_backend: "azure"
docker_cache_azure_account_name: "forexample"
docker_cache_azure_account_key: "forexample"
# docker_cache_azure_container_name: "dockerregistry"
# docker_cache_fs_mount_path: "/var/lib/registry"
# docker_cache_remote_url: "https://registry-1.docker.io"
# docker_cache_htpasswd: ""
# enable_marketplace: "true"
# ...
```
Make sure the setting of `enable_docker_cache` was `true`, and finish the [installation](./installation-guide.md), the docker-cache will be set up.
### `config.yaml` example with file system
``` yaml
# ...
# Optional
#######################################################################
# OpenPAI Customized Settings #
#######################################################################
# enable_hived_scheduler: true
enable_docker_cache: true
docker_cache_storage_backend: "filesystem"
# docker_cache_azure_account_name: ""
# docker_cache_azure_account_key: ""
# docker_cache_azure_container_name: "dockerregistry"
docker_cache_fs_mount_path: "/var/lib/registry"
# docker_cache_remote_url: "https://registry-1.docker.io"
# docker_cache_htpasswd: ""
# enable_marketplace: "true"
# ...
```
Make sure the setting of `enable_docker_cache` was `true`, and finish the [installation](./installation-guide.md), the docker-cache will be set up.
### htpasswd explained
The *htpasswd* authentication backend allows you to configure basic authentication using an [Apache htpasswd file](https://httpd.apache.org/docs/2.4/programs/htpasswd.html).
The only supported password format is *bcrypt*. Entries with other hash types are ignored. The htpasswd file is loaded once, at startup. If the file is invalid, the registry will display an error and will not start.
In docker-cache service, we use htpasswd info as k8s secret, which means `docker_cache_htpasswd` need base64 encoded htpasswd file content.
## Set Up Docker Image Cache for Deployed Cluster
For those who already deployed the cluster, there is no need to re-install the cluster totally to enable docker-cache service. The suggested way is to modify `config.yaml`, and use the following commands to upgrade.
```bash
echo "pai" > cluster-id # "pai" is default cluster-id, need to change if you changed in deployment
# assume the workdir is pai
echo "Generating services configurations..."
python3 ./contrib/kubespray/script/openpai_generator.py -l ./contrib/kubespray/config/layout.yaml -c ./contrib/kubespray/config/config.yaml -o /cluster-configuration
echo "Pushing cluster config to k8s..."
./paictl.py config push -p /cluster-configuration -m service < cluster-id
echo "Start docker-cache service..."
./paictl.py service start -n docker-cache
echo "Performing docker-cache config distribution..."
ansible-playbook -i ${HOME}/pai-deploy/cluster-cfg/hosts.yml docker-cache-config-distribute.yml || exit $?
```
### Use Customized Registry Configuration
For those who want to deploy a registry separated with OpenPAI cluster, a simple way is to modify `./contrib/kubespray/docker-cache-config-distribute.yml`, which is a playbook to modify the docker daemon config in each node. The playbook uses `30500` port of `kube-master` node by default. To use customized registry, only thing need to be changed is to replace `{{ hostvars[groups['kube-master'][0]]['ip'] }}:30500` with custom registry `<ip>:<port>` string.
```yaml
- hosts: all
become: true
become_user: root
gather_facts: true
roles:
- role: '../roles/docker-cache/install'
vars:
enable_docker_cache: true
docker_cache_host: "{{ hostvars[groups['kube-master'][0]]['ip'] }}:30500"
tasks:
- name: Restart service docker config from /etc/docker/daemon.json after update
ansible.builtin.systemd:
name: docker
```

Просмотреть файл

@ -224,10 +224,20 @@ docker_image_tag: v1.5.0
# Optional
#######################################################################
# OpenPAI Customized Settings #
#######################################################################
# enable_hived_scheduler: true
# enable_docker_cache: true
# docker_cache_storage_backend: "azure" # or "filesystem"
# docker_cache_azure_account_name: ""
# docker_cache_azure_account_key: ""
# docker_cache_azure_container_name: "dockerregistry"
# docker_cache_fs_mount_path: "/var/lib/registry"
# docker_cache_remote_url: "https://registry-1.docker.io"
# docker_cache_htpasswd: ""
# enable_marketplace: "true"
#############################################
# Ansible-playbooks' inventory hosts' vars. #

Просмотреть файл

@ -13,9 +13,10 @@ OpenPAI是一个提供完整人工智能模型训练和资源管理能力的开
5. [如何设置数据存储](./how-to-set-up-storage.md)
6. [如何设置虚拟集群](./how-to-set-up-virtual-clusters.md)
7. [如何添加和移除结点](./how-to-add-and-remove-nodes.md)
8. [如何使用插件定制集群](./how-to-customize-cluster-by-plugins.md)
9. [如何使用报警系统](./how-to-use-alert-system.md)
10. [故障排查](./troubleshooting.md)
11. [推荐实践](./recommended-practice.md)
12. [如何卸载OpenPAI](./how-to-uninstall-openpai.md)
13. [升级指南](./upgrade-guide.md)
8. [如何设置 Docker 镜像缓存](./how-to-set-up-docker-image-cache.md)
9. [如何使用插件定制集群](./how-to-customize-cluster-by-plugins.md)
10. [如何使用报警系统](./how-to-use-alert-system.md)
11. [故障排查](./troubleshooting.md)
12. [推荐实践](./recommended-practice.md)
13. [如何卸载OpenPAI](./how-to-uninstall-openpai.md)
14. [升级指南](./upgrade-guide.md)

Просмотреть файл

@ -0,0 +1,123 @@
# 如何设置 Docker 镜像缓存
[Docker 镜像缓存](https://docs.docker.com/registry/recipes/mirror/), 在 OpenPAI 中实现为 `docker-cache` 服务, 可以帮助 admin 避免 [Docker Hub rate limit](https://www.docker.com/increase-rate-limits)。Docker Hub rate limit 会造成部署服务或用户提交任务在超过限制时等待。Docker 镜像缓存被配置为一个以 [Azure Blob Storage](https://azure.microsoft.com/en-us/services/storage/blobs/) 或 Linux 文件系统为存储后端的 pull-through 缓存。此外, 通过提供的 docker-cache 配置分发脚本, admin 可以方便地使用自己地 docker registry 或者 pull-through cache。
Docker 镜像缓存提供了三种使用方式:
1. 启动一个使用 Azure Blob Storage 作为存储后端的缓存服务;
2. 启动一个使用 Linux 文件系统作为存储后端的缓存服务;
3. 使用自定义的 registry。
## 安装时配置 Docker 镜像缓存
During installation, the only effort you need to perform is change `config.yaml` in `contrib/kubespray/config.yaml`. Those setting with "docker_cache" substring are related in "OpenPAI Customized Settings" section.
在安装时,启用 Docker 镜像缓存只需要修改 `contrib/kubespray/config.yaml` 中的 `config.yaml`。"OpenPAI Customized Settings"段中,有"docker_cache"字段的是相关配置。
* `enable_docker_cache`: 如果希望使用 docker-cache 服务需要设置为 true默认为 false 并让后续的所有其它配置失效。
* `docker_cache_storage_backend`: 存储后端类型选择参数, "azure" 使用 [Azure Blob Storage](https://azure.microsoft.com/en-us/services/storage/blobs/), "filesystem" 使用 Linux 文件系统.
* `docker_azure_account_name`: 在存储后端类型为 "azure" 时必须填写,内容为你的 azure blob storage account name.
* `docker_azure_account_key`: 在存储后端类型为 "azure" 时必须填写,内容为 azure blob storage base64 encoded account key.
* `docker_cache_azure_container_name`: 在存储后端类型为 "azure" 时必须填写,在修改为特定的 container 名称时才需要修改,默认 container 名称为 "dockerregistry".
* `docker_cache_fs_mount_path`: 在存储后端类型为 "filesystem" 时必须填写, 在修改为特定的路径时才需要修改,默认为 "/var/lib/registry".
* `docker_cache_remote_url`: pull-through cache 所缓存的远程 registry 链接, 在修改为非 Docker Hub 的远程 registry 时才需要修改,默认为 "https://registry-1.docker.io/".
* `docker_cache_htpasswd`: base64 编码的 htpasswd 授权信息作为访问控制方法,如果使用 htpasswd 作为授权方式最好提供 ssl 保护。
### 使用 Azure Blob Storage 的 `config.yaml` 示例
``` yaml
# ...
# Optional
#######################################################################
# OpenPAI Customized Settings #
#######################################################################
# enable_hived_scheduler: true
enable_docker_cache: true
docker_cache_storage_backend: "azure"
docker_cache_azure_account_name: "forexample"
docker_cache_azure_account_key: "forexample"
# docker_cache_azure_container_name: "dockerregistry"
# docker_cache_fs_mount_path: "/var/lib/registry"
# docker_cache_remote_url: "https://registry-1.docker.io"
# docker_cache_htpasswd: ""
# enable_marketplace: "true"
# ...
```
确保 `enable_docker_cache` 配置为 `"true"`,并完成[安装](./installation-guide.md)docker-cache 服务应该就可以正常启动了。
### 使用 Linux 文件系统的 `config.yaml` 示例
``` yaml
# ...
# Optional
#######################################################################
# OpenPAI Customized Settings #
#######################################################################
# enable_hived_scheduler: true
enable_docker_cache: true
docker_cache_storage_backend: "filesystem"
# docker_cache_azure_account_name: ""
# docker_cache_azure_account_key: ""
# docker_cache_azure_container_name: "dockerregistry"
docker_cache_fs_mount_path: "/var/lib/registry"
# docker_cache_remote_url: "https://registry-1.docker.io"
# docker_cache_htpasswd: ""
# enable_marketplace: "true"
# ...
```
确保 `enable_docker_cache` 配置为 `"true"`,并完成[安装](./installation-guide.md)docker-cache 服务应该就可以正常启动了。
### htpasswd 解释
*htpasswd* 授权后端允许使用 [Apache htpasswd file](https://httpd.apache.org/docs/2.4/programs/htpasswd.html) 作为 basic auth 的配置。*htpasswd* 支持的 password 格式是 *bcrypt*。其它 hash 类别的表项会被虎烈。htpasswd 文件在启动时加载,如果 registry 显示错误,则不会启动。
在 docker-cache 服务中,我们使用将 htpasswd 信息作为 k8s secret 引入,因此需要对 htpasswd 文件内容做 base64 编码。.
## 为已部署的集群配置 Docker 镜像缓存
对于已经部署的集群,启用 docker-cache 服务并不需要重新安装集群。更推荐的方式是修改`config.yaml`,并通过如下命令升级。
```bash
echo "pai" > cluster-id # "pai" is default cluster-id, need to change if you changed in deployment
# assume the workdir is pai
echo "Generating services configurations..."
python3 ./contrib/kubespray/script/openpai_generator.py -l ./contrib/kubespray/config/layout.yaml -c ./contrib/kubespray/config/config.yaml -o /cluster-configuration
echo "Pushing cluster config to k8s..."
./paictl.py config push -p /cluster-configuration -m service < cluster-id
echo "Start docker-cache service..."
./paictl.py service start -n docker-cache
echo "Performing docker-cache config distribution..."
ansible-playbook -i ${HOME}/pai-deploy/cluster-cfg/hosts.yml docker-cache-config-distribute.yml || exit $?
```
### 使用自定义 registry 的配置
对于希望 OpenPAI 集群使用自定义的 registry 的用户,一个简单的方式时修改`./contrib/kubespray/docker-cache-config-distribute.yml`,该 playbook 负责修改集群内每个节点的 docker daemon 配置。在默认设置下,该 playbook 会添加 kube-master 节点的 30500 端口作为 docker-cache service 的入口。想使用自定义的 registry仅需要修改该文件中的 `{{ hostvars[groups['kube-master'][0]]['ip'] }}:30500` 为相应的 `<ip>:<port>` 字符串即可。
```yaml
- hosts: all
become: true
become_user: root
gather_facts: true
roles:
- role: '../roles/docker-cache/install'
vars:
enable_docker_cache: true
docker_cache_host: "{{ hostvars[groups['kube-master'][0]]['ip'] }}:30500"
tasks:
- name: Restart service docker config from /etc/docker/daemon.json after update
ansible.builtin.systemd:
name: docker
```

Просмотреть файл

@ -21,6 +21,7 @@ nav:
- How to Set Up Storage: manual/cluster-admin/how-to-set-up-storage.md
- How to Set Up Virtual Clusters: manual/cluster-admin/how-to-set-up-virtual-clusters.md
- How to Add and Remove Nodes: manual/cluster-admin/how-to-add-and-remove-nodes.md
- How to Set Up Docker Image Cache: manual/cluster-admin/how-to-set-up-docker-image-cache.md
- How to Customize Cluster by Plugins: manual/cluster-admin/how-to-customize-cluster-by-plugins.md
- How to Use Alert System: manual/cluster-admin/how-to-use-alert-system.md
- Troubleshooting: manual/cluster-admin/troubleshooting.md

Просмотреть файл

@ -21,6 +21,7 @@ nav:
- 如何设置数据存储: manual/cluster-admin/how-to-set-up-storage.md
- 如何设置虚拟集群: manual/cluster-admin/how-to-set-up-virtual-clusters.md
- 如何添加和移除结点: manual/cluster-admin/how-to-add-and-remove-nodes.md
- 如何设置 Docker 镜像缓存: manual/cluster-admin/how-to-set-up-docker-image-cache.md
- 如何使用插件定制集群: manual/cluster-admin/how-to-customize-cluster-by-plugins.md
- 如何使用报警系统: manual/cluster-admin/how-to-use-alert-system.md
- 故障排查: manual/cluster-admin/troubleshooting.md