Merge pull request #23 from jinlccs/master

Bug Fix: Allow restart a downloaded file. Add Monitoring.
This commit is contained in:
jinl 2017-09-26 10:24:42 -07:00 коммит произвёл GitHub
Родитель 1ffa77fc16 542e43b0fb
Коммит 2c06a89c9a
6 изменённых файлов: 34 добавлений и 1 удалений

Просмотреть файл

@ -5,6 +5,7 @@
* [Authentication](../deployment/authentication/FAQ.md)
* [Azure](../deployment/Azure/FAQ.md)
* [ACS](../deployment/ACS/FAQ.md)
* [Cluster Monitoring](../deployment/monitor/FAQ.md)
## Using DL Workspace

27
docs/deployment/monitor/FAQ.md Executable file
Просмотреть файл

@ -0,0 +1,27 @@
# Frequently Asked Question on Cluster monitoring
1. What is the mechanism in monitoring the operation of the cluster.
By default, we have installed kubernete dash board and grafana to monitor the status of the cluster and individual node.
2. How to access Kubernete dashboard/Grafana.
1. Kubernete Dashboard can be accessed at: https://[infranode]/ui
Grafana can be accessed at: https://[infranode]/api/v1/proxy/namespaces/kube-system/services/monitoring-grafana/?orgId=1
2. When visiting the monitoring site, you may get a message on certificate error.
1. For Chrome, the error message will be like "Your connection is not private. Attackers might be trying to steal your information from [infranode] (for example, passwords, messages, or credit cards). "
1. Please click "Advanced"
2. Click, "Proceed to [infranode](unsafe)"
2. For Microsoft Edge, you will get a message "This site is not secure".
1. Please click "Details"
2. Click, "Go on to the web page(Not recommended)"
3. You will need the admin username and password for the interface. They are set/automatically generated during the deployment procedure. Please look for the file cluster.yaml or config.yaml in src/ClusterBootstrap, look at the linke basic_auth:
```
basic_auth: [admin_password], [admin_username]
```
Type in the admin_username and admin_password to access Kubernete Dashboard and Grafana.

Просмотреть файл

@ -272,6 +272,7 @@ def acs_get_config():
if not (os.path.exists('./deploy/bin/kubectl')):
os.system("mkdir -p ./deploy/bin")
az_tryuntil("acs kubernetes install-cli --install-location ./deploy/bin/kubectl", lambda : os.path.exists('./deploy/bin/kubectl'))
os.system("rm ./deploy/%s" % config["acskubeconfig"])
if not (os.path.exists('./deploy/'+config["acskubeconfig"])):
cmd = "acs kubernetes get-credentials"
cmd += " --resource-group=%s" % config["acs_resource_group"]

Просмотреть файл

@ -22,6 +22,7 @@ pushd .
rm -rf /tmp/install-python
mkdir /tmp/install-python
cd /tmp/install-python
rm ${PYTHONFILE}
while [ ! -e ${PYTHONFILE} ]; do
wget http://downloads.activestate.com/ActivePython/releases/${VERSIONS}/${PYTHONFILE}
done

Просмотреть файл

@ -12,6 +12,7 @@ sudo apt-get install -y python-dev \
python-setuptools \
apt-transport-https
rm /tools/NVIDIA-Linux-x86_64-*.run
wget -P /tools http://us.download.nvidia.com/XFree86/Linux-x86_64/381.22/NVIDIA-Linux-x86_64-381.22.run
chmod +x /tools/NVIDIA-Linux-x86_64-381.22.run
sh /tools/NVIDIA-Linux-x86_64-381.22.run -a -s
@ -21,6 +22,7 @@ sudo apt install -y nvidia-modprobe
sudo rm -r /opt/nvidia-driver || true
# Install nvidia-docker and nvidia-docker-plugin
rm /tmp/nvidia-docker*.deb
wget -P /tmp https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.1/nvidia-docker_1.0.1-1_amd64.deb
sudo dpkg -i /tmp/nvidia-docker*.deb && rm /tmp/nvidia-docker*.deb

Просмотреть файл

@ -53,7 +53,8 @@ sudo service apache2 stop
if lspci | grep -qE "[0-9a-fA-F][0-9a-fA-F]:[0-9a-fA-F][0-9a-fA-F].[0-9] (3D|VGA compatible) controller: NVIDIA Corporation.*" ; then
NVIDIA_VERSION=381.22
# make the script reexecutable after a failed download
rm /tmp/NVIDIA-Linux-x86_64-$NVIDIA_VERSION.run
wget -P /tmp http://us.download.nvidia.com/XFree86/Linux-x86_64/$NVIDIA_VERSION/NVIDIA-Linux-x86_64-$NVIDIA_VERSION.run
chmod +x /tmp/NVIDIA-Linux-x86_64-$NVIDIA_VERSION.run
sudo bash /tmp/NVIDIA-Linux-x86_64-$NVIDIA_VERSION.run -a -s