Merge pull request #23 from jinlccs/master
Bug Fix: Allow restart a downloaded file. Add Monitoring.
This commit is contained in:
Коммит
2c06a89c9a
|
@ -5,6 +5,7 @@
|
|||
* [Authentication](../deployment/authentication/FAQ.md)
|
||||
* [Azure](../deployment/Azure/FAQ.md)
|
||||
* [ACS](../deployment/ACS/FAQ.md)
|
||||
* [Cluster Monitoring](../deployment/monitor/FAQ.md)
|
||||
|
||||
## Using DL Workspace
|
||||
|
||||
|
|
|
@ -0,0 +1,27 @@
|
|||
# Frequently Asked Question on Cluster monitoring
|
||||
|
||||
1. What is the mechanism in monitoring the operation of the cluster.
|
||||
|
||||
By default, we have installed kubernete dash board and grafana to monitor the status of the cluster and individual node.
|
||||
|
||||
2. How to access Kubernete dashboard/Grafana.
|
||||
|
||||
1. Kubernete Dashboard can be accessed at: https://[infranode]/ui
|
||||
Grafana can be accessed at: https://[infranode]/api/v1/proxy/namespaces/kube-system/services/monitoring-grafana/?orgId=1
|
||||
|
||||
2. When visiting the monitoring site, you may get a message on certificate error.
|
||||
1. For Chrome, the error message will be like "Your connection is not private. Attackers might be trying to steal your information from [infranode] (for example, passwords, messages, or credit cards). "
|
||||
1. Please click "Advanced"
|
||||
2. Click, "Proceed to [infranode](unsafe)"
|
||||
2. For Microsoft Edge, you will get a message "This site is not secure".
|
||||
1. Please click "Details"
|
||||
2. Click, "Go on to the web page(Not recommended)"
|
||||
|
||||
3. You will need the admin username and password for the interface. They are set/automatically generated during the deployment procedure. Please look for the file cluster.yaml or config.yaml in src/ClusterBootstrap, look at the linke basic_auth:
|
||||
|
||||
```
|
||||
basic_auth: [admin_password], [admin_username]
|
||||
```
|
||||
Type in the admin_username and admin_password to access Kubernete Dashboard and Grafana.
|
||||
|
||||
|
|
@ -272,6 +272,7 @@ def acs_get_config():
|
|||
if not (os.path.exists('./deploy/bin/kubectl')):
|
||||
os.system("mkdir -p ./deploy/bin")
|
||||
az_tryuntil("acs kubernetes install-cli --install-location ./deploy/bin/kubectl", lambda : os.path.exists('./deploy/bin/kubectl'))
|
||||
os.system("rm ./deploy/%s" % config["acskubeconfig"])
|
||||
if not (os.path.exists('./deploy/'+config["acskubeconfig"])):
|
||||
cmd = "acs kubernetes get-credentials"
|
||||
cmd += " --resource-group=%s" % config["acs_resource_group"]
|
||||
|
|
|
@ -22,6 +22,7 @@ pushd .
|
|||
rm -rf /tmp/install-python
|
||||
mkdir /tmp/install-python
|
||||
cd /tmp/install-python
|
||||
rm ${PYTHONFILE}
|
||||
while [ ! -e ${PYTHONFILE} ]; do
|
||||
wget http://downloads.activestate.com/ActivePython/releases/${VERSIONS}/${PYTHONFILE}
|
||||
done
|
||||
|
|
|
@ -12,6 +12,7 @@ sudo apt-get install -y python-dev \
|
|||
python-setuptools \
|
||||
apt-transport-https
|
||||
|
||||
rm /tools/NVIDIA-Linux-x86_64-*.run
|
||||
wget -P /tools http://us.download.nvidia.com/XFree86/Linux-x86_64/381.22/NVIDIA-Linux-x86_64-381.22.run
|
||||
chmod +x /tools/NVIDIA-Linux-x86_64-381.22.run
|
||||
sh /tools/NVIDIA-Linux-x86_64-381.22.run -a -s
|
||||
|
@ -21,6 +22,7 @@ sudo apt install -y nvidia-modprobe
|
|||
sudo rm -r /opt/nvidia-driver || true
|
||||
|
||||
# Install nvidia-docker and nvidia-docker-plugin
|
||||
rm /tmp/nvidia-docker*.deb
|
||||
wget -P /tmp https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.1/nvidia-docker_1.0.1-1_amd64.deb
|
||||
sudo dpkg -i /tmp/nvidia-docker*.deb && rm /tmp/nvidia-docker*.deb
|
||||
|
||||
|
|
|
@ -53,7 +53,8 @@ sudo service apache2 stop
|
|||
if lspci | grep -qE "[0-9a-fA-F][0-9a-fA-F]:[0-9a-fA-F][0-9a-fA-F].[0-9] (3D|VGA compatible) controller: NVIDIA Corporation.*" ; then
|
||||
|
||||
NVIDIA_VERSION=381.22
|
||||
|
||||
# make the script reexecutable after a failed download
|
||||
rm /tmp/NVIDIA-Linux-x86_64-$NVIDIA_VERSION.run
|
||||
wget -P /tmp http://us.download.nvidia.com/XFree86/Linux-x86_64/$NVIDIA_VERSION/NVIDIA-Linux-x86_64-$NVIDIA_VERSION.run
|
||||
chmod +x /tmp/NVIDIA-Linux-x86_64-$NVIDIA_VERSION.run
|
||||
sudo bash /tmp/NVIDIA-Linux-x86_64-$NVIDIA_VERSION.run -a -s
|
||||
|
|
Загрузка…
Ссылка в новой задаче