Update documentation and install prerequisites.
This commit is contained in:
Родитель
14cb225e5b
Коммит
415d38305e
|
@ -0,0 +1,11 @@
|
|||
# Frequently Asked Questions (FAQ) for Azure Cluster Deployment.
|
||||
|
||||
Please refer to [this](../knownissues/Readme.md) for more general deployment issues.
|
||||
|
||||
## After setup, I cannot visit the deployed DL Workspace portal.
|
||||
|
||||
* Please wait a few minutes after the deployment script runs through to allow the portal container to be pulled and scheduled for execution.
|
||||
|
||||
## I can't execute Spark job on Azure.
|
||||
|
||||
The current default deployment procedure on Azure doesn't deploy HDFS/Spark. So Spark job execution is not available.
|
|
@ -2,18 +2,15 @@
|
|||
|
||||
This document describes the procedure to deploy DL workspace cluster on Azure. We are still improving the deployment procedure on Azure. Please contact the authors if you have encounter deployment issue.
|
||||
|
||||
Please note that the procedure below doesn't deploy HDFS/Spark on DLWorkspace cluster on Azure. So Spark job execution is not available on Azure Cluster.
|
||||
|
||||
0. Follow [this document](../DevEnvironment/README.md) to setup the dev environment of DLWorkspace. Login to your Azure subscription on your dev machine via:
|
||||
|
||||
```
|
||||
az login
|
||||
```
|
||||
|
||||
|
||||
1. Please create Configuration file with single line of your cluster name. The cluster name should be unique, only with lower characters or numbers.
|
||||
|
||||
```
|
||||
cluster_name: [your cluster name]
|
||||
```
|
||||
1. Please [configure](configure.md) your azure cluster.
|
||||
|
||||
2. Initial cluster and generate certificates and keys:
|
||||
```
|
||||
|
@ -39,9 +36,6 @@ cluster_name: [your cluster name]
|
|||
```
|
||||
where machine1 is your azure infrastructure node. (you may get the address by ./deploy.py display)
|
||||
|
||||
|
||||
|
||||
|
||||
The script block execute the following command in sequences: (you do NOT need to run the following commands if you have run step 5)
|
||||
1. Setup basic tools on the Ubuntu image.
|
||||
```
|
||||
|
@ -68,6 +62,6 @@ cluster_name: [your cluster name]
|
|||
./deploy.py kubernetes start jobmanager restfulapi webportal
|
||||
```
|
||||
|
||||
6. If you run into a deployment issue, please check [here](Deployment_Issue.md) first.
|
||||
6. If you run into a deployment issue, please check [here](FAQ.md) first.
|
||||
|
||||
|
||||
|
|
|
@ -0,0 +1,53 @@
|
|||
# Configuration: Azure Cluster
|
||||
|
||||
For more customized configuration, please refer to the [Configuration Section](../configuration/Readme.md).
|
||||
|
||||
## Azure Cluster specific configuration
|
||||
|
||||
We have greatly simplified Azure Cluster Configuration. As a minimum, you will only need to create a config.yaml file under src/ClusterBootstrap, with the cluster name.
|
||||
|
||||
### Cluster Name
|
||||
|
||||
Cluster name must be unique, and should be specified as:
|
||||
|
||||
```
|
||||
cluster_name: [your cluster name]
|
||||
```
|
||||
|
||||
### Authentication
|
||||
|
||||
If you are not building a cluster for Microsoft employee usage, you will also need to configure [Authentication](../authentication/Readme.md).
|
||||
|
||||
### Additional configuration.
|
||||
|
||||
You may provide/change the specification of the deployed Azure cluster by adding the following information on config.yaml file:
|
||||
|
||||
```
|
||||
azure_cluster:
|
||||
<<your cluster name>>:
|
||||
"infra_node_num": 1,
|
||||
"worker_node_num": 2,
|
||||
"azure_location": "westus2",
|
||||
"infra_vm_size" : "Standard_D1_v2",
|
||||
"worker_vm_size": "Standard_NC6",
|
||||
```
|
||||
|
||||
* infra_node_num: should be odd (1, 3 or 5), number of infrastructure node for the deployment. 3 infrastructure node tolerate 1 failure, and 5 infrastructure node tolerate 2 failures. However, more infrastructure nodes (and more failure tolerance) will reduce performance of the node.
|
||||
|
||||
* worker_node_num: number of worker node used for deployment.
|
||||
|
||||
* azure_location:
|
||||
|
||||
Please use the following to find all available azure locations.
|
||||
```
|
||||
az account list-locations
|
||||
```
|
||||
|
||||
* infra_vm_size, worker_vm_size: infrastructure and worker VM size.
|
||||
|
||||
Usually, a CPU VM will be used for infra_vm_size, and a GPU VM will be used for worker_vm_size. Please use the following to find all available Azure VM size.
|
||||
```
|
||||
az vm list-sizes --location westus2
|
||||
```
|
||||
|
||||
|
|
@ -1,6 +1,6 @@
|
|||
# Build ISO (for USB) or docker image (PXE) for DL workspace deployment.
|
||||
|
||||
This document describes the procedure to build the deployment image for DL workspace cluster. After the configuration file is [created](Configuration.md), in directory 'src/ClusterBootstrap', run:
|
||||
This document describes the procedure to build the deployment image for DL workspace cluster. After the configuration file is [created](configuration/Readme.md), in directory 'src/ClusterBootstrap', run:
|
||||
|
||||
```
|
||||
python deploy.py -y build
|
||||
|
|
|
@ -2,7 +2,7 @@
|
|||
|
||||
0. [Run Once] The installation program needs certain utilities (such as docker and python). The simplest setup is to use the [development docker](../../DevDocker.md) and run the subsequent command in the development docker. Alternatively, you may choose to run install_prerequisites.sh once to install utilities.
|
||||
|
||||
1. [Create Configuration file](Configuration.md), and determine important information of the cluster (e.g., cluster name, number of Etcd servers used). Please refer to [Backup/Restore](Backup.md) on instruction to backup/restore cluster configuration.
|
||||
1. [Create Configuration file](configuration/Readme.md), and determine important information of the cluster (e.g., cluster name, number of Etcd servers used). Please refer to [Backup/Restore](Backup.md) on instruction to backup/restore cluster configuration.
|
||||
|
||||
2. Config storage system to be used in the cluster, following instructions in [Storage.md](Storage.md). If you are in Microsoft corp net and just want to test DLWorkspace in **small scale**, you may add the following configurations to config.yaml to use the public NFS server provided by CCS group.
|
||||
|
||||
|
|
|
@ -2,7 +2,7 @@
|
|||
|
||||
This document describes the procedure to deploy DL workspace cluster via PXE server. **__The procedure will automatically wipe out the system disk and deploy a CoreOS image of DL workspace cluster__** to all machines that are on the same subnet and served by the PXE server. Please proceed with caution.
|
||||
|
||||
1. [Create Configuration file](Configuration.md)
|
||||
1. [Create Configuration file](configuration/Readme.md)
|
||||
|
||||
2. [Build a bootable image](Build.md).
|
||||
|
||||
|
|
|
@ -1,7 +1,7 @@
|
|||
# Deployment of DL workspace cluster via USB sticks
|
||||
|
||||
This document describes the procedure to build and deploy a small DL workspace cluster via USB sticks. The key procedures are:
|
||||
1. [Create Configuration file](Configuration.md)
|
||||
1. [Create Configuration file](configuration/Readme.md)
|
||||
2. [Build a bootable image](Build.md) (ISO/docker image) :
|
||||
3. Burn a USB stick (>=1GB):
|
||||
You could use [Rufus](https://www.ubuntu.com/download/desktop/create-a-usb-stick-on-windows) tool recommended by Ubuntu or many other tools to burn .iso to a USB stick. If Rufus is used, please use the following options:
|
||||
|
|
|
@ -2,7 +2,7 @@
|
|||
|
||||
This document describes the procedure to deploy DL workspace cluster on a Ubuntu Cluster that is on a VLAN with a initial node that is used as a PXE-server for prime the cluster.
|
||||
|
||||
1. Please [create Configuration file](Configuration.md) and build [the relevant deployment key](Build.md).
|
||||
1. Please [create Configuration file](configuration/Readme.md) and build [the relevant deployment key](Build.md).
|
||||
Please copy config_azure.yaml.template to config.yaml, and fill in the necessary information of the cluster.
|
||||
|
||||
You may add the configuration to either config.yaml, or cluster.yaml, with the following entry:
|
||||
|
|
|
@ -2,7 +2,7 @@
|
|||
|
||||
## I don't see the log in option for (Gmail, Live.com, etc..)
|
||||
|
||||
Please check if you have properly configured [authentication](../authentication/Readme.md), and include the authentication option in the [config.yaml](../Configuration.md) file.
|
||||
Please check if you have properly configured [authentication](../authentication/Readme.md), and include the authentication option in the [config.yaml](../configuration/Readme.md) file.
|
||||
|
||||
## I am the administrator, and when I log in, I got the message "You are not an authorized user to this cluster!"
|
||||
|
||||
|
|
|
@ -2,7 +2,7 @@
|
|||
|
||||
This document describes the procedure to deploy DL workspace on a prior deployed CoreOS cluster (e.g., certain Philly deployment). This document is still evolving. Please contact the author for any question.
|
||||
|
||||
1. [Create Configuration file](Configuration.md)
|
||||
1. [Create Configuration file](configuration/Readme.md)
|
||||
1. Please copy src/ClusterBootstrap/config_philly.yaml.template to src/ClusterBootstrap/config.yaml, and further fill in various
|
||||
parameters in the config file.
|
||||
2. Please specify the private key used to access the Philly cluster.
|
||||
|
|
|
@ -4,7 +4,7 @@ Please follow the following setup to deploy DL workspace on philly.
|
|||
|
||||
0. [Run Once] The installation program needs certain utilities (such as docker and python). The simplest setup is to use the [development docker](../../DevDocker.md) and run the subsequent command in the development docker. Alternatively, you may choose to run install_prerequisites.sh once to install utilities.
|
||||
|
||||
1. [Create Configuration file](Configuration.md), and determine important information of the cluster (e.g., cluster name, number of Etcd servers used). Please refer to [Backup/Restore](Backup.md) on instruction to backup/restore cluster configuration.
|
||||
1. [Create Configuration file](../deployment/configuration/Readme.md), and determine important information of the cluster (e.g., cluster name, number of Etcd servers used). Please refer to [Backup/Restore](Backup.md) on instruction to backup/restore cluster configuration.
|
||||
|
||||
2. [Build deployment images and various executables] (Build.md):
|
||||
```
|
||||
|
|
|
@ -51,5 +51,11 @@ sudo apt-key adv --keyserver packages.microsoft.com --recv-keys 417A0893
|
|||
sudo apt-get install apt-transport-https
|
||||
sudo apt-get update && sudo apt-get install azure-cli
|
||||
|
||||
# Disable Network manager.
|
||||
if [ -f /etc/NetworkManager/NetworkManager.conf ]; then
|
||||
sed "s/^dns=dnsmasq$/#dns=dnsmasq/" /etc/NetworkManager/NetworkManager.conf > /tmp/NetworkManager.conf && sudo mv /tmp/NetworkManager.conf /etc/NetworkManager/NetworkManager.conf
|
||||
sudo service network-manager restart
|
||||
fi
|
||||
|
||||
|
||||
|
||||
|
|
|
@ -367,7 +367,7 @@ if __name__ == '__main__':
|
|||
Create and manage a Azure VM cluster.
|
||||
|
||||
Prerequest:
|
||||
* Create azure_cluster_config.yaml according to instruction in docs/deployment/Configuration.md.
|
||||
* Create config.yaml according to instruction in docs/deployment/azure/configure.md.
|
||||
|
||||
Command:
|
||||
create Create an Azure VM cluster based on the parameters in config file.
|
||||
|
|
Загрузка…
Ссылка в новой задаче