зеркало из https://github.com/microsoft/pai.git
Update docs for Cluster Autoscaler on AKS Engine (#5057)
Update docs for Cluster Autoscaler on AKS Engine.
This commit is contained in:
Родитель
5fbefb87c6
Коммит
1e9580e472
|
@ -1,56 +1,76 @@
|
||||||
#### Install Necessary Package.
|
# Cluster Autoscaler on AKS Engine
|
||||||
|
|
||||||
- [ Install Azure CLI ](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest)
|
[AKS Engine](https://github.com/Azure/aks-engine) is a tool to help you provision a self-managed Kubernetes cluster on Azure,
|
||||||
- [ Install AKS-Engine ](https://github.com/Azure/aks-engine/blob/master/docs/tutorials/quickstart.md#install-the-aks-engine-binary)
|
while [Cluster Autoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler) is another tool that automatically adjusts the size of the Kubernetes cluster.
|
||||||
|
The Cluster Autoscaler on Azure dynamically scales Kubernetes worker nodes.
|
||||||
|
|
||||||
#### Create Resource Group
|
This contrib aims to help you deploy a OpenPAI cluster on Azure using AKS Engine, and runs Cluster Autoscaler as a deployment in your cluster.
|
||||||
|
|
||||||
- Solution A [ Azure Portal ](https://docs.microsoft.com/en-us/azure/azure-resource-manager/management/manage-resource-groups-portal#create-resource-groups) (Recommended)
|
|
||||||
- Solution B [ Azure CLI ](https://docs.microsoft.com/en-us/azure/azure-resource-manager/management/manage-resource-groups-cli#create-resource-groups)
|
|
||||||
|
|
||||||
Remember the following parameters
|
## Preparations on Azure
|
||||||
|
|
||||||
- subscription id: ```${subscriptionId}```
|
1. Install Dependencies
|
||||||
- resource groupname: ```${resourcegroup}```
|
|
||||||
- location: ```${location}```
|
|
||||||
|
|
||||||
#### Create Service Principle
|
1. Install [Azure CLI](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest)
|
||||||
|
2. Install [AKS Engine](https://github.com/Azure/aks-engine/blob/master/docs/tutorials/quickstart.md#install-the-aks-engine-binary)
|
||||||
|
|
||||||
```bash
|
2. Create resource group
|
||||||
az ad sp create-for-rbac --skip-assignment --name ${service-principal-name}
|
|
||||||
```
|
|
||||||
|
|
||||||
If the command success, the output will like the following example.
|
There're two options to create resource group in your subscription:
|
||||||
|
* It's recommended to use [Azure Portal](https://docs.microsoft.com/en-us/azure/azure-resource-manager/management/manage-resource-groups-portal#create-resource-groups)
|
||||||
|
* You can also use [Azure CLI](https://docs.microsoft.com/en-us/azure/azure-resource-manager/management/manage-resource-groups-cli#create-resource-groups)
|
||||||
|
|
||||||
```json
|
Remember the following parameters which will be used later:
|
||||||
{
|
* subscription id `${subscriptionId}`
|
||||||
"appId": "559513bd-0c19-4c1a-87cd-851a26afd5fc",
|
* resource groupname `${resourcegroup}`
|
||||||
"displayName": "${service-principal-name}",
|
* location `${location}`
|
||||||
"name": "http://${service-principal-name}",
|
|
||||||
"password": "e763725a-5eee-40e8-a466-dc88d980f415",
|
|
||||||
"tenant": "72f988bf-86f1-41af-91ab-2d7cd011db48"
|
|
||||||
}
|
|
||||||
```
|
|
||||||
Remember the following parameters.
|
|
||||||
|
|
||||||
- ```appId```: ```${appId}```
|
3. Create Service Principal
|
||||||
- ```password```: ```${password}```
|
|
||||||
- ```displayName```: ```${spName}```
|
|
||||||
- ```tenant```: ```${tenant}```
|
|
||||||
|
|
||||||
|
|
||||||
[The doc about this steps](https://docs.microsoft.com/en-us/azure/aks/kubernetes-service-principal#manually-create-a-service-principal)
|
|
||||||
|
|
||||||
#### Ask your subscription's admin to add the new service principal as the owner of the new resource group.
|
Run the following command:
|
||||||
|
|
||||||
Content as the title. Important and don't forget it.
|
```sh
|
||||||
|
az ad sp create-for-rbac --skip-assignment --name ${service-principal-name}
|
||||||
|
```
|
||||||
|
|
||||||
#### Write Configuration
|
You will see the following output if it succeed:
|
||||||
|
|
||||||
[Configuration example](config.yml)
|
```json
|
||||||
|
{
|
||||||
|
"appId": "87432405-56b6-4d76-923b-39d1d75d19f7",
|
||||||
|
"displayName": "${service-principal-name}",
|
||||||
|
"name": "http://${service-principal-name}",
|
||||||
|
"password": "ff5b1601-1298-460d-a94f-fcc8b5ef96f0",
|
||||||
|
"tenant": "72e9b8a0-54c8-4742-8da6-1f5d1418c3c5"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
#### Start Cluster
|
Remember the following parameters which will be used later:
|
||||||
|
* appId `${appId}`
|
||||||
|
* password `${password}`
|
||||||
|
* displayName `${spName}`
|
||||||
|
* tenant `${tenant}`
|
||||||
|
|
||||||
```
|
For more details on how to create service principal, please refer to [manually-create-a-service-principal document](https://docs.microsoft.com/en-us/azure/aks/kubernetes-service-principal#manually-create-a-service-principal).
|
||||||
python3 azure.py -c config.yml
|
|
||||||
```
|
4. Add the service principal as the owner of the resource group.
|
||||||
|
|
||||||
|
|
||||||
|
## OpenPAI Deployment
|
||||||
|
|
||||||
|
1. Prepare the [configuration file](./config.yaml), replace the variables with parameters in previous steps.
|
||||||
|
To use Cluster Autosaler, specify the following lines in `openpai_worker_vmss`:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
openpai_worker_vmss:
|
||||||
|
...
|
||||||
|
ca_enable: true
|
||||||
|
min_vm_count: 1
|
||||||
|
max_vm_count: 10
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Deploy Kubernetes cluster with AKS Engine, and deploy OpenPAI:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
python3 azure.py -c config.yaml
|
||||||
|
```
|
||||||
|
|
|
@ -68,7 +68,7 @@ To remove the network plugin, you could use following `ansible-playbook`:
|
||||||
shell: systemctl restart kubelet
|
shell: systemctl restart kubelet
|
||||||
args:
|
args:
|
||||||
executable: /bin/bash
|
executable: /bin/bash
|
||||||
|
|
||||||
- name: restart docker
|
- name: restart docker
|
||||||
shell: systemctl restart docker
|
shell: systemctl restart docker
|
||||||
args:
|
args:
|
||||||
|
@ -113,6 +113,10 @@ Please refer to the [official document](https://github.com/NVIDIA/nvidia-contain
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
|
#### How to deploy on [Azure Kubernetes Service (AKS)](https://azure.microsoft.com/en-us/services/kubernetes-service/) with [Cluster Autoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler)?
|
||||||
|
|
||||||
|
Please refer to [this document](https://github.com/microsoft/pai/tree/master/contrib/aks-engine).
|
||||||
|
|
||||||
## Troubleshooting
|
## Troubleshooting
|
||||||
|
|
||||||
#### Command `Apt install <some package>` fails in the script.
|
#### Command `Apt install <some package>` fails in the script.
|
||||||
|
|
Загрузка…
Ссылка в новой задаче