From 724961c5383f9bba57e1984fe8ecf721ac227888 Mon Sep 17 00:00:00 2001 From: Xiang Long Date: Tue, 9 Mar 2021 13:19:59 +0800 Subject: [PATCH] Add docker cache doc (#5349) --- docs/manual/cluster-admin/README.md | 13 +- .../how-to-set-up-docker-image-cache.md | 123 ++++++++++++++++++ .../cluster-admin/installation-guide.md | 10 ++ docs_zh_CN/manual/cluster-admin/README.md | 13 +- .../how-to-set-up-docker-image-cache.md | 123 ++++++++++++++++++ mkdocs.yml | 1 + mkdocs_zh_CN.yml | 1 + 7 files changed, 272 insertions(+), 12 deletions(-) create mode 100644 docs/manual/cluster-admin/how-to-set-up-docker-image-cache.md create mode 100644 docs_zh_CN/manual/cluster-admin/how-to-set-up-docker-image-cache.md diff --git a/docs/manual/cluster-admin/README.md b/docs/manual/cluster-admin/README.md index 177f2df76..0b4403b0f 100644 --- a/docs/manual/cluster-admin/README.md +++ b/docs/manual/cluster-admin/README.md @@ -16,9 +16,10 @@ This manual is for cluster administrators to learn the installation and uninstal 6. [How to Set Up Virtual Clusters](./how-to-set-up-virtual-clusters.md) 7. [How to Set Up Marketplace](./how-to-set-up-marketplace.md) 8. [How to Add and Remove Nodes](./how-to-add-and-remove-nodes.md) -9. [How to Customize Cluster by Plugins](./how-to-customize-cluster-by-plugins.md) -10. [How to Use Alert System](./how-to-use-alert-system.md) -11. [Troubleshooting](./troubleshooting.md) -12. [Recommended Practice](./recommended-practice.md) -13. [How to Uninstall OpenPAI](./how-to-uninstall-openpai.md) -14. [Upgrade Guide](./upgrade-guide.md) +9. [How to Set Up Docker Image Cache](./how-to-set-up-docker-image-cache.md) +10. [How to Customize Cluster by Plugins](./how-to-customize-cluster-by-plugins.md) +11. [How to Use Alert System](./how-to-use-alert-system.md) +12. [Troubleshooting](./troubleshooting.md) +13. [Recommended Practice](./recommended-practice.md) +14. [How to Uninstall OpenPAI](./how-to-uninstall-openpai.md) +15. [Upgrade Guide](./upgrade-guide.md) diff --git a/docs/manual/cluster-admin/how-to-set-up-docker-image-cache.md b/docs/manual/cluster-admin/how-to-set-up-docker-image-cache.md new file mode 100644 index 000000000..01c26ca26 --- /dev/null +++ b/docs/manual/cluster-admin/how-to-set-up-docker-image-cache.md @@ -0,0 +1,123 @@ +# How to Set Up Docker Image Cache + +[Docker Image Cache](https://docs.docker.com/registry/recipes/mirror/), implemented as docker-cache service in OpenPAI, can help admin avoid [Docker Hub rate limit](https://www.docker.com/increase-rate-limits), which makes deployment of service or user sumbitted job pending for a while. Docker Image Cache is basically set as a pull-through cache with [Azure Blob Storage](https://azure.microsoft.com/en-us/services/storage/blobs/) or linux filesystem as storage backend. Furthermore, with utility script to distribute docker-cache config, admins can easily switch to use their own docker registry or pull-through cache. + +Docker image cache provides three different approaches: +1. Boot a cache service with Azure Blob Storage backend; +2. Boot a cache service with Linux file system backend; +3. Use a custom registry with the cluster. + +## Set Up Docker Image Cache during Installation + +During installation, the only effort you need to perform is change `config.yaml` in `contrib/kubespray/config.yaml`. Those setting with "docker_cache" substring are related in "OpenPAI Customized Settings" section. + +* `enable_docker_cache`: true if you want to enable docker-cache service, default is false, which makes all following params won't take effect. +* `docker_cache_storage_backend`: storage backend type selector, "azure" is for [Azure Blob Storage](https://azure.microsoft.com/en-us/services/storage/blobs/), "filesystem" is for linux filesystem. +* `docker_azure_account_name`: required when storage backend is "azure", should be your[Azure Blob Storage](https://azure.microsoft.com/en-us/services/storage/blobs/) account name. +* `docker_azure_account_key`: required when storage backend is "azure", should be your [Azure Blob Storage](https://azure.microsoft.com/en-us/services/storage/blobs/) base64 encoded account key. +* `docker_cache_azure_container_name`: required when storage backend is "azure", should be modified if you want to specify container name your docker-cache use, default is "dockerregistry". +* `docker_cache_fs_mount_path`: required when storage backend is "filesystem", should be modified if you want to specify path your docker-cache use, default is "/var/lib/registry". +* `docker_cache_remote_url`: pull-through cache remote URL, should be modified if you want to specify the other remote docker registry rather than Docker Hub, default is "https://registry-1.docker.io/". +* `docker_cache_htpasswd`: htpasswd auth info with base64 encoded, should be used with SSL when docker-cache cache some private registry as an access control method. + +### `config.yaml` example with Azure + +``` yaml +# ... + +# Optional + +####################################################################### +# OpenPAI Customized Settings # +####################################################################### +# enable_hived_scheduler: true +enable_docker_cache: true +docker_cache_storage_backend: "azure" +docker_cache_azure_account_name: "forexample" +docker_cache_azure_account_key: "forexample" +# docker_cache_azure_container_name: "dockerregistry" +# docker_cache_fs_mount_path: "/var/lib/registry" +# docker_cache_remote_url: "https://registry-1.docker.io" +# docker_cache_htpasswd: "" +# enable_marketplace: "true" + +# ... + +``` + +Make sure the setting of `enable_docker_cache` was `true`, and finish the [installation](./installation-guide.md), the docker-cache will be set up. + +### `config.yaml` example with file system + +``` yaml +# ... + +# Optional + +####################################################################### +# OpenPAI Customized Settings # +####################################################################### +# enable_hived_scheduler: true +enable_docker_cache: true +docker_cache_storage_backend: "filesystem" +# docker_cache_azure_account_name: "" +# docker_cache_azure_account_key: "" +# docker_cache_azure_container_name: "dockerregistry" +docker_cache_fs_mount_path: "/var/lib/registry" +# docker_cache_remote_url: "https://registry-1.docker.io" +# docker_cache_htpasswd: "" +# enable_marketplace: "true" + +# ... + +``` + +Make sure the setting of `enable_docker_cache` was `true`, and finish the [installation](./installation-guide.md), the docker-cache will be set up. + +### htpasswd explained + +The *htpasswd* authentication backend allows you to configure basic authentication using an [Apache htpasswd file](https://httpd.apache.org/docs/2.4/programs/htpasswd.html). +The only supported password format is *bcrypt*. Entries with other hash types are ignored. The htpasswd file is loaded once, at startup. If the file is invalid, the registry will display an error and will not start. + +In docker-cache service, we use htpasswd info as k8s secret, which means `docker_cache_htpasswd` need base64 encoded htpasswd file content. + +## Set Up Docker Image Cache for Deployed Cluster + +For those who already deployed the cluster, there is no need to re-install the cluster totally to enable docker-cache service. The suggested way is to modify `config.yaml`, and use the following commands to upgrade. + +```bash +echo "pai" > cluster-id # "pai" is default cluster-id, need to change if you changed in deployment + +# assume the workdir is pai +echo "Generating services configurations..." +python3 ./contrib/kubespray/script/openpai_generator.py -l ./contrib/kubespray/config/layout.yaml -c ./contrib/kubespray/config/config.yaml -o /cluster-configuration + +echo "Pushing cluster config to k8s..." +./paictl.py config push -p /cluster-configuration -m service < cluster-id + +echo "Start docker-cache service..." +./paictl.py service start -n docker-cache + +echo "Performing docker-cache config distribution..." +ansible-playbook -i ${HOME}/pai-deploy/cluster-cfg/hosts.yml docker-cache-config-distribute.yml || exit $? +``` + +### Use Customized Registry Configuration + +For those who want to deploy a registry separated with OpenPAI cluster, a simple way is to modify `./contrib/kubespray/docker-cache-config-distribute.yml`, which is a playbook to modify the docker daemon config in each node. The playbook uses `30500` port of `kube-master` node by default. To use customized registry, only thing need to be changed is to replace `{{ hostvars[groups['kube-master'][0]]['ip'] }}:30500` with custom registry `:` string. + +```yaml +- hosts: all + become: true + become_user: root + gather_facts: true + roles: + - role: '../roles/docker-cache/install' + vars: + enable_docker_cache: true + docker_cache_host: "{{ hostvars[groups['kube-master'][0]]['ip'] }}:30500" + tasks: + - name: Restart service docker config from /etc/docker/daemon.json after update + ansible.builtin.systemd: + name: docker +``` \ No newline at end of file diff --git a/docs/manual/cluster-admin/installation-guide.md b/docs/manual/cluster-admin/installation-guide.md index 54d243e17..a17125d57 100644 --- a/docs/manual/cluster-admin/installation-guide.md +++ b/docs/manual/cluster-admin/installation-guide.md @@ -224,10 +224,20 @@ docker_image_tag: v1.5.0 # Optional + ####################################################################### # OpenPAI Customized Settings # ####################################################################### # enable_hived_scheduler: true +# enable_docker_cache: true +# docker_cache_storage_backend: "azure" # or "filesystem" +# docker_cache_azure_account_name: "" +# docker_cache_azure_account_key: "" +# docker_cache_azure_container_name: "dockerregistry" +# docker_cache_fs_mount_path: "/var/lib/registry" +# docker_cache_remote_url: "https://registry-1.docker.io" +# docker_cache_htpasswd: "" +# enable_marketplace: "true" ############################################# # Ansible-playbooks' inventory hosts' vars. # diff --git a/docs_zh_CN/manual/cluster-admin/README.md b/docs_zh_CN/manual/cluster-admin/README.md index 7b7284d2f..b7cdf45c4 100644 --- a/docs_zh_CN/manual/cluster-admin/README.md +++ b/docs_zh_CN/manual/cluster-admin/README.md @@ -13,9 +13,10 @@ OpenPAI是一个提供完整人工智能模型训练和资源管理能力的开 5. [如何设置数据存储](./how-to-set-up-storage.md) 6. [如何设置虚拟集群](./how-to-set-up-virtual-clusters.md) 7. [如何添加和移除结点](./how-to-add-and-remove-nodes.md) -8. [如何使用插件定制集群](./how-to-customize-cluster-by-plugins.md) -9. [如何使用报警系统](./how-to-use-alert-system.md) -10. [故障排查](./troubleshooting.md) -11. [推荐实践](./recommended-practice.md) -12. [如何卸载OpenPAI](./how-to-uninstall-openpai.md) -13. [升级指南](./upgrade-guide.md) +8. [如何设置 Docker 镜像缓存](./how-to-set-up-docker-image-cache.md) +9. [如何使用插件定制集群](./how-to-customize-cluster-by-plugins.md) +10. [如何使用报警系统](./how-to-use-alert-system.md) +11. [故障排查](./troubleshooting.md) +12. [推荐实践](./recommended-practice.md) +13. [如何卸载OpenPAI](./how-to-uninstall-openpai.md) +14. [升级指南](./upgrade-guide.md) diff --git a/docs_zh_CN/manual/cluster-admin/how-to-set-up-docker-image-cache.md b/docs_zh_CN/manual/cluster-admin/how-to-set-up-docker-image-cache.md new file mode 100644 index 000000000..c2fc0f016 --- /dev/null +++ b/docs_zh_CN/manual/cluster-admin/how-to-set-up-docker-image-cache.md @@ -0,0 +1,123 @@ +# 如何设置 Docker 镜像缓存 + +[Docker 镜像缓存](https://docs.docker.com/registry/recipes/mirror/), 在 OpenPAI 中实现为 `docker-cache` 服务, 可以帮助 admin 避免 [Docker Hub rate limit](https://www.docker.com/increase-rate-limits)。Docker Hub rate limit 会造成部署服务或用户提交任务在超过限制时等待。Docker 镜像缓存被配置为一个以 [Azure Blob Storage](https://azure.microsoft.com/en-us/services/storage/blobs/) 或 Linux 文件系统为存储后端的 pull-through 缓存。此外, 通过提供的 docker-cache 配置分发脚本, admin 可以方便地使用自己地 docker registry 或者 pull-through cache。 + +Docker 镜像缓存提供了三种使用方式: +1. 启动一个使用 Azure Blob Storage 作为存储后端的缓存服务; +2. 启动一个使用 Linux 文件系统作为存储后端的缓存服务; +3. 使用自定义的 registry。 + +## 安装时配置 Docker 镜像缓存 + +During installation, the only effort you need to perform is change `config.yaml` in `contrib/kubespray/config.yaml`. Those setting with "docker_cache" substring are related in "OpenPAI Customized Settings" section. +在安装时,启用 Docker 镜像缓存只需要修改 `contrib/kubespray/config.yaml` 中的 `config.yaml`。"OpenPAI Customized Settings"段中,有"docker_cache"字段的是相关配置。 + +* `enable_docker_cache`: 如果希望使用 docker-cache 服务需要设置为 true,默认为 false 并让后续的所有其它配置失效。 +* `docker_cache_storage_backend`: 存储后端类型选择参数, "azure" 使用 [Azure Blob Storage](https://azure.microsoft.com/en-us/services/storage/blobs/), "filesystem" 使用 Linux 文件系统. +* `docker_azure_account_name`: 在存储后端类型为 "azure" 时必须填写,内容为你的 azure blob storage account name. +* `docker_azure_account_key`: 在存储后端类型为 "azure" 时必须填写,内容为 azure blob storage base64 encoded account key. +* `docker_cache_azure_container_name`: 在存储后端类型为 "azure" 时必须填写,在修改为特定的 container 名称时才需要修改,默认 container 名称为 "dockerregistry". +* `docker_cache_fs_mount_path`: 在存储后端类型为 "filesystem" 时必须填写, 在修改为特定的路径时才需要修改,默认为 "/var/lib/registry". +* `docker_cache_remote_url`: pull-through cache 所缓存的远程 registry 链接, 在修改为非 Docker Hub 的远程 registry 时才需要修改,默认为 "https://registry-1.docker.io/". +* `docker_cache_htpasswd`: base64 编码的 htpasswd 授权信息作为访问控制方法,如果使用 htpasswd 作为授权方式最好提供 ssl 保护。 + +### 使用 Azure Blob Storage 的 `config.yaml` 示例 + +``` yaml +# ... + +# Optional + +####################################################################### +# OpenPAI Customized Settings # +####################################################################### +# enable_hived_scheduler: true +enable_docker_cache: true +docker_cache_storage_backend: "azure" +docker_cache_azure_account_name: "forexample" +docker_cache_azure_account_key: "forexample" +# docker_cache_azure_container_name: "dockerregistry" +# docker_cache_fs_mount_path: "/var/lib/registry" +# docker_cache_remote_url: "https://registry-1.docker.io" +# docker_cache_htpasswd: "" +# enable_marketplace: "true" + +# ... + +``` + +确保 `enable_docker_cache` 配置为 `"true"`,并完成[安装](./installation-guide.md),docker-cache 服务应该就可以正常启动了。 + +### 使用 Linux 文件系统的 `config.yaml` 示例 + +``` yaml +# ... + +# Optional + +####################################################################### +# OpenPAI Customized Settings # +####################################################################### +# enable_hived_scheduler: true +enable_docker_cache: true +docker_cache_storage_backend: "filesystem" +# docker_cache_azure_account_name: "" +# docker_cache_azure_account_key: "" +# docker_cache_azure_container_name: "dockerregistry" +docker_cache_fs_mount_path: "/var/lib/registry" +# docker_cache_remote_url: "https://registry-1.docker.io" +# docker_cache_htpasswd: "" +# enable_marketplace: "true" + +# ... + +``` + +确保 `enable_docker_cache` 配置为 `"true"`,并完成[安装](./installation-guide.md),docker-cache 服务应该就可以正常启动了。 + +### htpasswd 解释 + +*htpasswd* 授权后端允许使用 [Apache htpasswd file](https://httpd.apache.org/docs/2.4/programs/htpasswd.html) 作为 basic auth 的配置。*htpasswd* 支持的 password 格式是 *bcrypt*。其它 hash 类别的表项会被虎烈。htpasswd 文件在启动时加载,如果 registry 显示错误,则不会启动。 + +在 docker-cache 服务中,我们使用将 htpasswd 信息作为 k8s secret 引入,因此需要对 htpasswd 文件内容做 base64 编码。. + +## 为已部署的集群配置 Docker 镜像缓存 + +对于已经部署的集群,启用 docker-cache 服务并不需要重新安装集群。更推荐的方式是修改`config.yaml`,并通过如下命令升级。 + +```bash +echo "pai" > cluster-id # "pai" is default cluster-id, need to change if you changed in deployment + +# assume the workdir is pai +echo "Generating services configurations..." +python3 ./contrib/kubespray/script/openpai_generator.py -l ./contrib/kubespray/config/layout.yaml -c ./contrib/kubespray/config/config.yaml -o /cluster-configuration + +echo "Pushing cluster config to k8s..." +./paictl.py config push -p /cluster-configuration -m service < cluster-id + +echo "Start docker-cache service..." +./paictl.py service start -n docker-cache + +echo "Performing docker-cache config distribution..." +ansible-playbook -i ${HOME}/pai-deploy/cluster-cfg/hosts.yml docker-cache-config-distribute.yml || exit $? +``` + +### 使用自定义 registry 的配置 + +对于希望 OpenPAI 集群使用自定义的 registry 的用户,一个简单的方式时修改`./contrib/kubespray/docker-cache-config-distribute.yml`,该 playbook 负责修改集群内每个节点的 docker daemon 配置。在默认设置下,该 playbook 会添加 kube-master 节点的 30500 端口作为 docker-cache service 的入口。想使用自定义的 registry,仅需要修改该文件中的 `{{ hostvars[groups['kube-master'][0]]['ip'] }}:30500` 为相应的 `:` 字符串即可。 + +```yaml +- hosts: all + become: true + become_user: root + gather_facts: true + roles: + - role: '../roles/docker-cache/install' + vars: + enable_docker_cache: true + docker_cache_host: "{{ hostvars[groups['kube-master'][0]]['ip'] }}:30500" + tasks: + - name: Restart service docker config from /etc/docker/daemon.json after update + ansible.builtin.systemd: + name: docker +``` \ No newline at end of file diff --git a/mkdocs.yml b/mkdocs.yml index c34a69dea..7a2ea71b8 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -21,6 +21,7 @@ nav: - How to Set Up Storage: manual/cluster-admin/how-to-set-up-storage.md - How to Set Up Virtual Clusters: manual/cluster-admin/how-to-set-up-virtual-clusters.md - How to Add and Remove Nodes: manual/cluster-admin/how-to-add-and-remove-nodes.md + - How to Set Up Docker Image Cache: manual/cluster-admin/how-to-set-up-docker-image-cache.md - How to Customize Cluster by Plugins: manual/cluster-admin/how-to-customize-cluster-by-plugins.md - How to Use Alert System: manual/cluster-admin/how-to-use-alert-system.md - Troubleshooting: manual/cluster-admin/troubleshooting.md diff --git a/mkdocs_zh_CN.yml b/mkdocs_zh_CN.yml index 68a9ccea5..d44ad275f 100644 --- a/mkdocs_zh_CN.yml +++ b/mkdocs_zh_CN.yml @@ -21,6 +21,7 @@ nav: - 如何设置数据存储: manual/cluster-admin/how-to-set-up-storage.md - 如何设置虚拟集群: manual/cluster-admin/how-to-set-up-virtual-clusters.md - 如何添加和移除结点: manual/cluster-admin/how-to-add-and-remove-nodes.md + - 如何设置 Docker 镜像缓存: manual/cluster-admin/how-to-set-up-docker-image-cache.md - 如何使用插件定制集群: manual/cluster-admin/how-to-customize-cluster-by-plugins.md - 如何使用报警系统: manual/cluster-admin/how-to-use-alert-system.md - 故障排查: manual/cluster-admin/troubleshooting.md