diff --git a/labs/day1-labs/00-lab-environment.md b/labs/day1-labs/00-lab-environment.md new file mode 100644 index 0000000..cb6b954 --- /dev/null +++ b/labs/day1-labs/00-lab-environment.md @@ -0,0 +1,61 @@ +# Lab Environment + +## Classroom Setting + +These labs are designed for delivery in a classroom setting with the **Azure Global Blackbelt Team.** We typically provide an Azure subscription and a Linux VM (jumpbox) for attendees to complete the labs. + +### Getting Registered + +* Register for the class with the URL provided by the team (eg - http://aka.ms/something). + + ![alt text](img/spektra-register.png "Spektra Registration") + +* On the next page, click the `Launch Lab` button. +* Wait for the lab to be prepared. You will receive **TWO** emails. Wait for the second email and the lab details to appear in the browser. _This can take a few minutes._ +* Note the details for your On Demand Lab: + * Azure Credentials + * Service Principal Details + * Environment Details + + ![alt text](img/spektra-ready.png "Spektra ready") + +### Setup Environment + +* The first two labs will require you to RDP into a Linux jumpbox in the Azure subscription created for you. + * Ensure you have a proper RDP client on your PC. + * On the Mac, use Remote Desktop Client in the App Store. +* Setup Azure Cloud Shell: + + 1. Browse to http://portal.azure.com + 2. Login with the Azure credentials that were created in the previous steps (eg - "odl_user_12345@gbbossteamoutlook.onmicrosoft.com") + 3. Click on the cloud shell icon to start your session. + + ![alt text](img/cloud-shell-start.png "Spektra ready") + + 4. Select `Bash (Linux)` + 5. You will be prompted to setup storage for your cloud shell. Click `Show advanced settings` + + ![alt text](img/cloud-show-advanced.png "Spektra ready") + + 6. Provide a unique value for Storage account name. This must be all lower case and no punctuation. Use "cloudshell" for File share name. See example below. + + ![alt text](img/cloud-storage-config.png "Spektra ready") + + 7. Click `Create storage` + + > Note: You can also use the dedicated Azure Cloud Shell URL: http://shell.azure.com + + +## Self-guided + +It is possible to use your own machine outside of the classroom. You will need the following in order to complete these labs: + +* Azure subscription +* Linux, Mac, or Windows with Bash +* Docker +* Azure CLI +* Visual Studio Code +* Helm +* Kubernetes CLI (kubectl) +* MongoDB (only lab #1 requires this) +* GitHub account and git tools diff --git a/labs/day1-labs/01-create-aks-cluster.md b/labs/day1-labs/01-create-aks-cluster.md new file mode 100644 index 0000000..4a7d43a --- /dev/null +++ b/labs/day1-labs/01-create-aks-cluster.md @@ -0,0 +1,110 @@ +# Azure Kubernetes Service (AKS) Deployment + +## Create AKS cluster + +1. Login to Azure Portal at http://portal.azure.com. Your Azure login ID will look something like `odl_user_9294@gbbossteamoutlook.onmicrosoft.com` +2. Open the Azure Cloud Shell + + ![Azure Cloud Shell](img/cloudshell.png "Azure Cloud Shell") + +3. The first time Cloud Shell is started will require you to create a storage account. In our lab, you must click `Advanced` and enter an account name and share. + +4. Once your cloud shell is started, clone the workshop repo into the cloud shell environment + ``` + git clone https://github.com/Azure/blackbelt-aks-hackfest.git + ``` + +5. In the cloud shell, you are automatically logged into your Azure subscription. ```az login``` is not required. + +6. Verify your subscription is correctly selected as the default + ``` + az account list + ``` + +7. Find your RG name + + ``` + az group list + ``` + + ``` + + [ + { + "id": "/subscriptions/b23accae-e655-44e6-a08d-85fb5f1bb854/resourceGroups/ODL-aks-v2-gbb-8386", + "location": "centralus", + "managedBy": null, + "name": "ODL-aks-v2-gbb-8386", + "properties": { + "provisioningState": "Succeeded" + }, + "tags": { + "AttendeeId": "8391", + "LaunchId": "486", + "LaunchType": "ODL", + "TemplateId": "1153" + } + } + ] + + # copy the name from the results above and set to a variable + + NAME= + + # We need to use a different cluster name, as sometimes the name in the group list has an underscore, and only dashes are permitted + + CLUSTER_NAME="${NAME//_}" + + ``` + +8. Create your AKS cluster in the resource group created above with 2 nodes, targeting Kubernetes version 1.7.7 + ``` + # This command can take 5-25 minutes to run as it is creating the AKS cluster. Please be PATIENT... + + # set the location to one of the provided AKS locations (eg - centralus, eastus) + LOCATION= + + az aks create -n $CLUSTER_NAME -g $NAME -c 2 -k 1.7.7 --generate-ssh-keys -l $LOCATION + ``` + +9. Verify your cluster status. The `ProvisioningState` should be `Succeeded` + ``` + az aks list -o table + + Name Location ResourceGroup KubernetesVersion ProvisioningState Fqdn + ------------------- ---------- -------------------- ------------------- ------------------- ------------------------------------------------------------------- + ODLaks-v2-gbb-16502 centralus ODL_aks-v2-gbb-16502 1.7.7 Succeeded odlaks-v2--odlaks-v2-gbb-16-b23acc-17863579.hcp.centralus.azmk8s.io + ``` + + +10. Get the Kubernetes config files for your new AKS cluster + ``` + az aks get-credentials -n $CLUSTER_NAME -g $NAME + ``` + +11. Verify you have API access to your new AKS cluster + + > Note: It can take 5 minutes for your nodes to appear and be in READY state. You can run `watch kubectl get nodes` to monitor status. + + ``` + kubectl get nodes + + NAME STATUS ROLES AGE VERSION + aks-nodepool1-20004257-0 Ready agent 4m v1.7.7 + aks-nodepool1-20004257-1 Ready agent 4m v1.7.7 + ``` + + To see more details about your cluster: + + ``` + kubectl cluster-info + + Kubernetes master is running at https://odlaks-v2--odlaks-v2-gbb-11-b23acc-115da6a3.hcp.centralus.azmk8s.io:443 + Heapster is running at https://odlaks-v2--odlaks-v2-gbb-11-b23acc-115da6a3.hcp.centralus.azmk8s.io:443/api/v1/namespaces/kube-system/services/heapster/proxy + KubeDNS is running at https://odlaks-v2--odlaks-v2-gbb-11-b23acc-115da6a3.hcp.centralus.azmk8s.io:443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy + kubernetes-dashboard is running at https://odlaks-v2--odlaks-v2-gbb-11-b23acc-115da6a3.hcp.centralus.azmk8s.io:443/api/v1/namespaces/kube-system/services/kubernetes-dashboard/proxy + + To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'. + ``` + +You should now have a Kubernetes cluster running with 2 nodes. You do not see the master servers for the cluster because these are managed by Microsoft. The Control Plane services which manage the Kubernetes cluster such as scheduling, API access, configuration data store and object controllers are all provided as services to the nodes. diff --git a/labs/day1-labs/02-deploy-app-aks.md b/labs/day1-labs/02-deploy-app-aks.md new file mode 100644 index 0000000..e18df8f --- /dev/null +++ b/labs/day1-labs/02-deploy-app-aks.md @@ -0,0 +1,131 @@ +# Deploy the Superhero Ratings App to AKS + +## Review/Edit the YAML Config Files + +1. In Azure Cloud Shell edit `heroes-db.yaml` using `vi` + ``` + cd ~/blackbelt-aks-hackfest/labs/helper-files + + vi heroes-db.yaml + ``` + * Review the yaml file and learn about some of the settings + * Update the yaml file for the proper container image name + * You will need to replace the `` with the ACR login server created in lab 2 + * Example: + + ``` + spec: + containers: + - image: mycontainerregistry.azurecr.io/azureworkshop/rating-db:v1 + name: heroes-db-cntnr + ``` + +2. In Azure Cloud Shell edit `heroes-web-api.yaml` using `vi` + ``` + cd ~/blackbelt-aks-hackfest/labs/helper-files + + vi heroes-web-api.yaml + ``` + * Review the yaml file and learn about some of the settings. Note the environment variables that allow the services to connect + * Update the yaml file for the proper container image names. + * You will need to replace the `` with the ACR login server created in lab 2 + > Note: You will update the image name TWICE updating the web and api container images. + + * Example: + + ``` + spec: + containers: + - image: mycontainerregistry.azurecr.io/azureworkshop/rating-web:v1 + name: heroes-web-cntnr + ``` + +## Setup AKS with access to Azure Container Registry + +There are a few ways that AKS clusters can access your private Azure Container Registry. Generally the service account that kubernetes utilizes will have rights based on its Azure credentials. In our lab config, we must create a secret to allow this access. + +``` +# set these values to yours +ACR_SERVER= +ACR_USER= +ACR_PWD= + +kubectl create secret docker-registry acr-secret --docker-server=$ACR_SERVER --docker-username=$ACR_USER --docker-password=$ACR_PWD --docker-email=superman@heroes.com +``` + +> Note: You can review the `heroes-db.yaml` and `heroes-web-api.yaml` to see where the `imagePullSecrets` are configured. + +## Deploy database container to AKS + +* Use the kubectl CLI to deploy each app + ``` + cd ~/blackbelt-aks-hackfest/labs/helper-files + + kubectl apply -f heroes-db.yaml + ``` + +* Get mongodb pod name + ``` + kubectl get pods + + NAME READY STATUS RESTARTS AGE + heroes-db-deploy-2357291595-k7wjk 1/1 Running 0 3m + + MONGO_POD=heroes-db-deploy-2357291595-k7wjk + ``` + +* Import data into MongoDB using script + ``` + # ensure the pod name variable is set to your pod name + # once you exec into pod, run the `import.sh` script + + kubectl exec -it $MONGO_POD bash + + root@heroes-db-deploy-2357291595-xb4xm:/# ./import.sh + 2018-01-16T21:38:44.819+0000 connected to: localhost + 2018-01-16T21:38:44.918+0000 imported 4 documents + 2018-01-16T21:38:44.927+0000 connected to: localhost + 2018-01-16T21:38:45.031+0000 imported 72 documents + 2018-01-16T21:38:45.040+0000 connected to: localhost + 2018-01-16T21:38:45.152+0000 imported 2 documents + root@heroes-db-deploy-2357291595-xb4xm:/# exit + + # be sure to exit pod as shown above + ``` + +## Deploy the web and api containers to AKS + +* Use the kubectl CLI to deploy each app + + ``` + cd ~/blackbelt-aks-hackfest/labs/helper-files + + kubectl apply -f heroes-web-api.yaml + ``` + +## Validate + +* Check to see if pods are running in your cluster + ``` + kubectl get pods + + NAME READY STATUS RESTARTS AGE + heroes-api-deploy-1140957751-2z16s 1/1 Running 0 2m + heroes-db-deploy-2357291595-k7wjk 1/1 Running 0 3m + heroes-web-1645635641-pfzf9 1/1 Running 0 2m + ``` + +* Check to see if services are deployed. + ``` + kubectl get service + + NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE + api LoadBalancer 10.0.20.156 52.176.104.50 3000:31416/TCP 5m + kubernetes ClusterIP 10.0.0.1 443/TCP 12m + mongodb ClusterIP 10.0.5.133 27017/TCP 5m + web LoadBalancer 10.0.54.206 52.165.235.114 8080:32404/TCP 5m + ``` + +* Browse to the External IP for your web application (on port 8080) and try the app + +> The public IP can take a few minutes to create with a new cluster. Sit back and relax. Maybe check Facebook. \ No newline at end of file diff --git a/labs/day1-labs/03-kubernetes-ui.md b/labs/day1-labs/03-kubernetes-ui.md new file mode 100644 index 0000000..12a0413 --- /dev/null +++ b/labs/day1-labs/03-kubernetes-ui.md @@ -0,0 +1,30 @@ +# Kubernetes Dashboard + +The Kubernetes dashboard is a web ui that lets you view, monitor, and troubleshoot Kubernetes resources. + +> Note: The Kubernetes dashboard is a secured endpoint and can only be accessed using the SSH keys for the cluster. Since cloud shell runs in the browser, it is not possible to tunnel to the dashboard using the steps below. + +### Accessing The Dashboard UI + +There are multiple ways of accessing Kubernetes dashboard. You can access through kubectl command-line interface or through the master server API. We'll be using kubectl, as it provides a secure connection, that doesn't expose the UI to the internet. + +1. Command-Line Proxy + + * Open an RDP session to the jumpbox IP with username and password + * Run ```az login``` to authenticate with Azure in order to use Azure CLI in the Jumpbox instead of Cloud Shell + * Run ```NAME=$(az group list -o table | grep ODL | awk '{print $1}')``` in order to retrieve the name of the resource group for your Azure account and put it in the NAME variable. + * Run ```CLUSTER_NAME="${NAME//_}"``` in order to retrieve the cluster name (to remove the underscore) + * Run ```az aks get-credentials -n $CLUSTER_NAME -g $NAME``` in order to get the credentials to access our managed Kubernetes cluster in Azure + * Run ```kubectl proxy``` + * This creates a local proxy to 127.0.0.1:8001 + * Open a web browser (Firefox is pre-installed on the Jumpbox) and point to: + +### Explore Kubernetes Dashboard + +1. In the Kubernetes Dashboard select nodes to view +![](img/ui_nodes.png) +2. Explore the different node properties available through the dashboard +3. Explore the different pod properties available through the dashboard ![](img/ui_pods.png) +4. In this lab feel free to take a look around other at other resources Kubernetes provides through the dashboard + +> To learn more about Kubernetes objects and resources, browse the documentation: diff --git a/labs/day1-labs/04-monitoring-k8s.md b/labs/day1-labs/04-monitoring-k8s.md new file mode 100644 index 0000000..a81b7a9 --- /dev/null +++ b/labs/day1-labs/04-monitoring-k8s.md @@ -0,0 +1,126 @@ +# Add Monitoring to an Azure Kubernetes Service Cluster + +There are a number of monitoring solutions available today. Here is a quick, but not exhaustive list for reference purposes: +* Datadog +* Sysdig +* Elastic Stack +* Splunk +* Operations Management Suite +* Prometheus + +For the purposes of this lab we will be focusing in on Prometheus and using Grafana to provide a visual Dashboard of our Azure Kubernetes Service Cluster. + +## Install Helm + +We are going to be installing Prometheus and Grafana into our K8s cluster using Helm and Tiller. You can think of Helm as a package manager for Kubernetes with Tiller being the server-side component. + +1. In the Azure Cloud Shell, the Helm CLI is already installed + +2. Initialize Helm + ``` + helm init + ``` + +3. Validate Helm and Tiller were installed successfully + ``` + helm version + # You should see something like the following as output: + Client: &version.Version{SemVer:"v2.7.2", GitCommit:"8478fb4fc723885b155c924d1c8c410b7a9444e6", GitTreeState:"clean"} + Server: &version.Version{SemVer:"v2.7.2", GitCommit:"8478fb4fc723885b155c924d1c8c410b7a9444e6", GitTreeState:"clean"} + ``` + +## Install Prometheus using Helm +Prometheus is a Cloud Native Computing Foundation (CNCF) project used to collect and process metrics. It collects metrics from configured targets, in our case it is a Kubernetes Cluster. + +1. Install Prometheus using Helm CLI + + Switch to the `helper-files` directory and view the `prometheus-configforhelm.yaml` file. This configures Helm to install Prometheus with our desired settings. + ``` + cd ~/blackbelt-aks-hackfest/labs/helper-files + # The following command will install Prometheus into the K8s cluster using custom settings + + helm install --name gbbhackprometheus stable/prometheus --version 4.6.13 -f prometheus-configforhelm.yaml + ``` + +2. Validate that Prometheus was Installed + ``` + kubectl get pods | grep prometheus + # You should see something like the following as output: + gbbhackprometheus-prometheus-kube-state-metrics-5b9f4d9d9-vctrx 1/1 Running 0 3m + gbbhackprometheus-prometheus-node-exporter-v6frn 1/1 Running 0 3m + gbbhackprometheus-prometheus-server-54f5bcb797-sbzsp 2/2 Running 0 3m + ``` + + ``` + kubectl get svc | grep prometheus + # You should see something like the following as output: + gbbhackprometheus-prometheus-kube-state-metrics ClusterIP None 80/TCP 3m + gbbhackprometheus-prometheus-node-exporter ClusterIP None 9100/TCP 3m + gbbhackprometheus-prometheus-server LoadBalancer 10.0.212.145 52.168.100.25 9090:32340/TCP 3m + ``` + +## Install Grafana +Grafana is a dashboard visualization tool that can use all kinds of data sources. In our case, Prometheus will be used as the data source. + +1. Install Grafana using Helm CLI + The following command will install Grafana into the K8s cluster with a few custom settings to make it easier to access. + * We are setting the default username and password to **admin** to make it easier to remember + * We are also setting the service type to **LoadBalancer** to expose the service outside of the cluster and make it accessible via the Internet + + ``` + helm install --name gbbhackgrafana stable/grafana --version 0.5.1 --set server.service.type=LoadBalancer,server.adminUser=admin,server.adminPassword=admin,server.image=grafana/grafana:4.6.3,server.persistentVolume.enabled=false + ``` + +2. Validate that Grafana was Installed + ``` + kubectl get pods | grep grafana + # You should see something like the following as output: + hgrafana-grafana-855db78dc4-pnzth 1/1 Running 0 2h + ``` + + ``` + kubectl get svc | grep grafana + # You should see something like the following as output, take note of the **EXTERNAL-IP column**: + khgrafana-grafana LoadBalancer 10.0.163.226 "52.226.75.38" 80:31476/TCP 2h + ``` + +3. Test Grafana UI Comes Up +Use the EXTERNAL-IP value from the previous step and put that into your browser: + * eg. http://52.226.75.38, EXTERNAL-IP column from above. You should see something like the following come up, be patient it will take a moment or two: + + ![](img/8-grafana_default.png) + +## Setting up Grafana +1. Log into Grafana Dashboard using **admin** for the username and password + * You should see something like the following: + + ![](img/8-grafana_loggedin.png) + +2. Add Prometheus as a Data Source + * If you recall from above, we exposed a number of K8s services, one of those services was the Prometheus Server. We are going to use that Service endpoint in our Data Service configuration. The Add Data Source screen should look something like the below screen shot. + + > Use `http://gbbhackprometheus-prometheus-server:9090` for the URL in the HTTP settings. + + ![](img/8-grafana_datasource.png) + +3. Validate Prometheus Data Source + * Once you have filled in the values similar to the screenshot above, click the **Add** button and ensure no errors come back. + +4. Add K8s Monitoring Dashboard to Grafana + * After the datasource has been added, it is now time to add a dashboard. Grafana dashboards can be shared on Grafana.com. Go to import dashboards via the menu in the top left. + + ![](img/8-grafana_dashboardimport.png) + + * Click on the **Upload File** button and browse to the `grafana-dashboard.json` in the `helper-files` directory. You can also paste the contents of the json into the text box. + + ![](img/8-grafana_dashboardid.png) + + * Set the datasource dropdown to the "AKSPrometheus" that was created in the previous step. + + ![](img/8-grafana_dashboardsave.png) + + * Click the **Import** button. + + ![](img/8-grafana_k8sdashboard.png) + + You should now have Prometheus and Grafana running in your Azure Kubernetes Service cluster and be able to see the Grafana Dashboard. diff --git a/labs/day1-labs/05-cluster-scaling.md b/labs/day1-labs/05-cluster-scaling.md new file mode 100644 index 0000000..c555f04 --- /dev/null +++ b/labs/day1-labs/05-cluster-scaling.md @@ -0,0 +1,83 @@ +# Working with Azure Kubernetes Service Cluster Scaling + +Imagine a scenario where your realize that your existing cluster is at capacity and you need to scale it out to add more nodes in order to increase capacity and be able to deploy more PODS. + +## Scale Application +1. Check to see current number of pods running via Grafana Dashboard. +* Go to the same Grafana Dashboard from lab 6 and look at the **Pods Running Count** section. You will see the total count of Pods and the various phases they are in. + +![](img/9-grafana_podsrunning.png) + +2. Check to see current number of heroes pods running via K8s CLI. +```bash +kubectl get pods | grep heroes +# You should see something like the following as output (one replica of each pod): +heroes-api-deploy-1165643395-fwjtm 1/1 Running 0 2d +heroes-db-deploy-839157328-4656j 1/1 Running 0 2d +heroes-web-1677855039-8t57k 1/1 Running 0 2d +``` +3. Scale out the Web application +* To simulate a real-world scenario we are going to scale the web app to handle increased load. +```bash +# This command will create multiple replicas of the heroes-web pod to simulate additional load on the cluster. +kubectl scale deploy/heroes-web-deploy --replicas=4 +``` +4. Check to see number of pods now running via Grafana Dashboard + +![](img/9-grafana_podsrunning.png) + +5. Check to see number of heroes pods running via kubectl +```bash +kubectl get pod | grep heroes +# You should see something like the following as output (more than one heroes-web pod and some of them in different states): +NAME READY STATUS RESTARTS AGE +heroes-web-3683626428-4m1v4 0/1 Pending 0 2m +heroes-web-3683626428-hcs49 1/1 Running 0 4m +heroes-web-3683626428-z1t1j 0/1 Pending 0 2m +heroes-web-3683626428-zxp2s 1/1 Running 0 2m +``` + +6. Check up on Pods Running in Grafana dashboard +* As you can see we have a number of pods that are in the pending state which means they are trying to be scheduled to run. In this scenario the cluster is out of capacity so they are not able to be scheduled. + +![](img/9-grafana_podspending.png) + + +## Scale K8s Cluster +1. Check to see number of current nodes running. +```bash +kubectl get nodes +# You should see something like the following as output (there is one node in the cluster): +NAME STATUS ROLES AGE VERSION +aks-nodepool1-42552728-0 Ready agent 4h v1.7.7 +aks-nodepool1-42552728-1 Ready agent 4h v1.7.7 +``` +2. Scale out AKS cluster to accomodate the demand +```bash +# set these values to match yours (the cluster and the RG are the same name) +RESOURCE_GROUP_NAME=$(az group list | jq '.[0]."name"' -r) +AKS_CLUSTER_NAME="${RESOURCE_GROUP_NAME//_}" + +az aks scale -g $RESOURCE_GROUP_NAME -n $AKS_CLUSTER_NAME --node-count 4 +``` + +> Note this may take some time. Good time to get some coffee. + +3. Check to see if the new nodes are deployed and "Ready" +```bash +kubectl get nodes +# You should see something like the following as output (there are now 4 nodes in the cluster): +NAME STATUS ROLES AGE VERSION +aks-nodepool1-42552728-0 Ready agent 5h v1.7.7 +aks-nodepool1-42552728-1 Ready agent 5h v1.7.7 +aks-nodepool1-42552728-2 Ready agent 7m v1.7.7 +aks-nodepool1-42552728-3 Ready agent 7m v1.7.7 +``` + +4. Re-visit Grafana Dasboard to validate cluster scale is working. +* Take a look at the **Pods Pending Count** again and you should see that after a few minutes the number of pending pods is going down. + +![](img/9-grafana_podsscaling.png) + + +You now have additional node capacity in your Azure Kubernetes Service cluster to be able to provision more pods. diff --git a/labs/day1-labs/06-cluster-upgrading.md b/labs/day1-labs/06-cluster-upgrading.md new file mode 100644 index 0000000..a203d47 --- /dev/null +++ b/labs/day1-labs/06-cluster-upgrading.md @@ -0,0 +1,98 @@ +# Upgrade an Azure Kubernetes Service (AKS) cluster + +Azure Container Service (AKS) makes it easy to perform common management tasks including upgrading Kubernetes clusters. + +## Upgrade an AKS cluster + +Before upgrading a cluster, use the `az aks get-upgrades` command to check which Kubernetes releases are available for upgrade. + +```azurecli-interactive +az aks get-upgrades --name $CLUSTER_NAME --resource-group $NAME --output table +``` + +Output: + +```console +Name ResourceGroup MasterVersion MasterUpgrades NodePoolVersion NodePoolUpgrades +------- --------------- --------------- ------------------- ------------------ ------------------- +default myResourceGroup 1.7.7 1.8.2, 1.7.9, 1.8.1 1.7.7 1.8.2, 1.7.9, 1.8.1 +``` + +We have three versions available for upgrade: 1.7.9, 1.8.1 and 1.8.2. We can use the `az aks upgrade` command to upgrade to the latest available version. During the upgrade process, nodes are carefully [cordoned and drained][kubernetes-drain] to minimize disruption to running applications. Before initiating a cluster upgrade, ensure that you have enough additional compute capacity to handle your workload as cluster nodes are added and removed. + +```azurecli-interactive +az aks upgrade --name $CLUSTER_NAME --resource-group $NAME --kubernetes-version 1.8.2 +``` + +Output: + +```json +{ + "id": "/subscriptions/4f48eeae-9347-40c5-897b-46af1b8811ec/resourcegroups/myResourceGroup/providers/Microsoft.ContainerService/managedClusters/myK8sCluster", + "location": "eastus", + "name": "myK8sCluster", + "properties": { + "accessProfiles": { + "clusterAdmin": { + "kubeConfig": "..." + }, + "clusterUser": { + "kubeConfig": "..." + } + }, + "agentPoolProfiles": [ + { + "count": 1, + "dnsPrefix": null, + "fqdn": null, + "name": "myK8sCluster", + "osDiskSizeGb": null, + "osType": "Linux", + "ports": null, + "storageProfile": "ManagedDisks", + "vmSize": "Standard_D2_v2", + "vnetSubnetId": null + } + ], + "dnsPrefix": "myK8sClust-myResourceGroup-4f48ee", + "fqdn": "myk8sclust-myresourcegroup-4f48ee-406cc140.hcp.eastus.azmk8s.io", + "kubernetesVersion": "1.8.2", + "linuxProfile": { + "adminUsername": "azureuser", + "ssh": { + "publicKeys": [ + { + "keyData": "..." + } + ] + } + }, + "provisioningState": "Succeeded", + "servicePrincipalProfile": { + "clientId": "e70c1c1c-0ca4-4e0a-be5e-aea5225af017", + "keyVaultSecretRef": null, + "secret": null + } + }, + "resourceGroup": "myResourceGroup", + "tags": null, + "type": "Microsoft.ContainerService/ManagedClusters" +} +``` + +You can now confirm the upgrade was successful with the `az aks show` command. + +```azurecli-interactive +az aks show --name $CLUSTER_NAME --resource-group $NAME --output table +``` + +Output: + +```json +Name Location ResourceGroup KubernetesVersion ProvisioningState Fqdn +------------ ---------- --------------- ------------------- ------------------- ---------------------------------------------------------------- +myK8sCluster eastus myResourceGroup 1.8.2 Succeeded myk8sclust-myresourcegroup-3762d8-2f6ca801.hcp.eastus.azmk8s.io +``` + +## Attribution: +Content originally created by @gabrtv et al. from [this](https://docs.microsoft.com/en-us/azure/aks/upgrade-cluster) Azure Doc diff --git a/labs/day1-labs/img/1.png b/labs/day1-labs/img/1.png new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/labs/day1-labs/img/1.png @@ -0,0 +1 @@ + diff --git a/labs/day1-labs/img/8-grafana_dashboardid.png b/labs/day1-labs/img/8-grafana_dashboardid.png new file mode 100644 index 0000000..29e965e Binary files /dev/null and b/labs/day1-labs/img/8-grafana_dashboardid.png differ diff --git a/labs/day1-labs/img/8-grafana_dashboardimport.png b/labs/day1-labs/img/8-grafana_dashboardimport.png new file mode 100644 index 0000000..9e012ae Binary files /dev/null and b/labs/day1-labs/img/8-grafana_dashboardimport.png differ diff --git a/labs/day1-labs/img/8-grafana_dashboardsave.png b/labs/day1-labs/img/8-grafana_dashboardsave.png new file mode 100644 index 0000000..e1f183a Binary files /dev/null and b/labs/day1-labs/img/8-grafana_dashboardsave.png differ diff --git a/labs/day1-labs/img/8-grafana_datasource.png b/labs/day1-labs/img/8-grafana_datasource.png new file mode 100644 index 0000000..48f6ab7 Binary files /dev/null and b/labs/day1-labs/img/8-grafana_datasource.png differ diff --git a/labs/day1-labs/img/8-grafana_default.png b/labs/day1-labs/img/8-grafana_default.png new file mode 100644 index 0000000..d545c63 Binary files /dev/null and b/labs/day1-labs/img/8-grafana_default.png differ diff --git a/labs/day1-labs/img/8-grafana_k8sdashboard.png b/labs/day1-labs/img/8-grafana_k8sdashboard.png new file mode 100644 index 0000000..418b392 Binary files /dev/null and b/labs/day1-labs/img/8-grafana_k8sdashboard.png differ diff --git a/labs/day1-labs/img/8-grafana_loggedin.png b/labs/day1-labs/img/8-grafana_loggedin.png new file mode 100644 index 0000000..47adec9 Binary files /dev/null and b/labs/day1-labs/img/8-grafana_loggedin.png differ diff --git a/labs/day1-labs/img/9-grafana_podspending.png b/labs/day1-labs/img/9-grafana_podspending.png new file mode 100644 index 0000000..4949e22 Binary files /dev/null and b/labs/day1-labs/img/9-grafana_podspending.png differ diff --git a/labs/day1-labs/img/9-grafana_podsrunning.png b/labs/day1-labs/img/9-grafana_podsrunning.png new file mode 100644 index 0000000..83de884 Binary files /dev/null and b/labs/day1-labs/img/9-grafana_podsrunning.png differ diff --git a/labs/day1-labs/img/9-grafana_podsscaling.png b/labs/day1-labs/img/9-grafana_podsscaling.png new file mode 100644 index 0000000..7851c1c Binary files /dev/null and b/labs/day1-labs/img/9-grafana_podsscaling.png differ diff --git a/labs/day1-labs/img/cloud-shell-start.png b/labs/day1-labs/img/cloud-shell-start.png new file mode 100644 index 0000000..584036e Binary files /dev/null and b/labs/day1-labs/img/cloud-shell-start.png differ diff --git a/labs/day1-labs/img/cloud-show-advanced.png b/labs/day1-labs/img/cloud-show-advanced.png new file mode 100644 index 0000000..aa97403 Binary files /dev/null and b/labs/day1-labs/img/cloud-show-advanced.png differ diff --git a/labs/day1-labs/img/cloud-storage-config.png b/labs/day1-labs/img/cloud-storage-config.png new file mode 100644 index 0000000..5572ab4 Binary files /dev/null and b/labs/day1-labs/img/cloud-storage-config.png differ diff --git a/labs/day1-labs/img/cloudshell.png b/labs/day1-labs/img/cloudshell.png new file mode 100644 index 0000000..3183d8e Binary files /dev/null and b/labs/day1-labs/img/cloudshell.png differ diff --git a/labs/day1-labs/img/cosmos_create_collection.png b/labs/day1-labs/img/cosmos_create_collection.png new file mode 100644 index 0000000..da05075 Binary files /dev/null and b/labs/day1-labs/img/cosmos_create_collection.png differ diff --git a/labs/day1-labs/img/cosmos_data_explorer.png b/labs/day1-labs/img/cosmos_data_explorer.png new file mode 100644 index 0000000..cb8ec9d Binary files /dev/null and b/labs/day1-labs/img/cosmos_data_explorer.png differ diff --git a/labs/day1-labs/img/creating_cosmos.png b/labs/day1-labs/img/creating_cosmos.png new file mode 100644 index 0000000..8206807 Binary files /dev/null and b/labs/day1-labs/img/creating_cosmos.png differ diff --git a/labs/day1-labs/img/finding_cosmos.png b/labs/day1-labs/img/finding_cosmos.png new file mode 100644 index 0000000..c4562f9 Binary files /dev/null and b/labs/day1-labs/img/finding_cosmos.png differ diff --git a/labs/day1-labs/img/spektra-ready.png b/labs/day1-labs/img/spektra-ready.png new file mode 100644 index 0000000..93cfc84 Binary files /dev/null and b/labs/day1-labs/img/spektra-ready.png differ diff --git a/labs/day1-labs/img/spektra-register.png b/labs/day1-labs/img/spektra-register.png new file mode 100644 index 0000000..64ac7ce Binary files /dev/null and b/labs/day1-labs/img/spektra-register.png differ diff --git a/labs/day1-labs/img/ui_nodes.png b/labs/day1-labs/img/ui_nodes.png new file mode 100644 index 0000000..a099454 Binary files /dev/null and b/labs/day1-labs/img/ui_nodes.png differ diff --git a/labs/day1-labs/img/ui_pods.png b/labs/day1-labs/img/ui_pods.png new file mode 100644 index 0000000..18fb22b Binary files /dev/null and b/labs/day1-labs/img/ui_pods.png differ diff --git a/labs/helper-files/grafana-dashboard.json b/labs/helper-files/grafana-dashboard.json new file mode 100644 index 0000000..10212e2 --- /dev/null +++ b/labs/helper-files/grafana-dashboard.json @@ -0,0 +1,1265 @@ +{ + "__inputs": [ + { + "name": "DS_GBBPROMETHEUS", + "label": "GBBPrometheus", + "description": "", + "type": "datasource", + "pluginId": "prometheus", + "pluginName": "Prometheus" + } + ], + "__requires": [ + { + "type": "grafana", + "id": "grafana", + "name": "Grafana", + "version": "4.6.3" + }, + { + "type": "panel", + "id": "graph", + "name": "Graph", + "version": "" + }, + { + "type": "datasource", + "id": "prometheus", + "name": "Prometheus", + "version": "1.0.0" + }, + { + "type": "panel", + "id": "singlestat", + "name": "Singlestat", + "version": "" + } + ], + "annotations": { + "list": [ + { + "builtIn": 1, + "datasource": "-- Grafana --", + "enable": true, + "hide": true, + "iconColor": "rgba(0, 211, 255, 1)", + "name": "Annotations & Alerts", + "type": "dashboard" + } + ] + }, + "description": "Monitors Kubernetes cluster using Prometheus. Shows overall cluster CPU / Memory / Filesystem usage as well as a count of the pods and their current phase. Uses cAdvisor metrics only.", + "editable": true, + "gnetId": 1621, + "graphTooltip": 0, + "hideControls": false, + "id": null, + "links": [], + "refresh": "10s", + "rows": [ + { + "collapse": false, + "height": "250px", + "panels": [ + { + "cacheTimeout": null, + "colorBackground": false, + "colorValue": true, + "colors": [ + "rgba(50, 172, 45, 0.97)", + "rgba(237, 129, 40, 0.89)", + "rgba(245, 54, 54, 0.9)" + ], + "datasource": "${DS_GBBPROMETHEUS}", + "editable": true, + "error": false, + "format": "percent", + "gauge": { + "maxValue": 100, + "minValue": 0, + "show": true, + "thresholdLabels": false, + "thresholdMarkers": true + }, + "height": "180px", + "id": 4, + "interval": null, + "links": [], + "mappingType": 1, + "mappingTypes": [ + { + "name": "value to text", + "value": 1 + }, + { + "name": "range to text", + "value": 2 + } + ], + "maxDataPoints": 100, + "nullPointMode": "connected", + "nullText": null, + "postfix": "", + "postfixFontSize": "50%", + "prefix": "", + "prefixFontSize": "50%", + "rangeMaps": [ + { + "from": "null", + "text": "N/A", + "to": "null" + } + ], + "span": 4, + "sparkline": { + "fillColor": "rgba(31, 118, 189, 0.18)", + "full": false, + "lineColor": "rgb(31, 120, 193)", + "show": false + }, + "tableColumn": "", + "targets": [ + { + "expr": "sum (container_memory_working_set_bytes{id=\"/\",kubernetes_io_hostname=~\"^$Node$\"}) / sum (machine_memory_bytes{kubernetes_io_hostname=~\"^$Node$\"}) * 100", + "interval": "10s", + "intervalFactor": 1, + "refId": "A", + "step": 10 + } + ], + "thresholds": "65, 90", + "title": "Cluster memory usage", + "transparent": false, + "type": "singlestat", + "valueFontSize": "80%", + "valueMaps": [ + { + "op": "=", + "text": "N/A", + "value": "null" + } + ], + "valueName": "current" + }, + { + "cacheTimeout": null, + "colorBackground": false, + "colorValue": true, + "colors": [ + "rgba(50, 172, 45, 0.97)", + "rgba(237, 129, 40, 0.89)", + "rgba(245, 54, 54, 0.9)" + ], + "datasource": "${DS_GBBPROMETHEUS}", + "decimals": 2, + "editable": true, + "error": false, + "format": "percent", + "gauge": { + "maxValue": 100, + "minValue": 0, + "show": true, + "thresholdLabels": false, + "thresholdMarkers": true + }, + "height": "180px", + "id": 6, + "interval": null, + "links": [], + "mappingType": 1, + "mappingTypes": [ + { + "name": "value to text", + "value": 1 + }, + { + "name": "range to text", + "value": 2 + } + ], + "maxDataPoints": 100, + "nullPointMode": "connected", + "nullText": null, + "postfix": "", + "postfixFontSize": "50%", + "prefix": "", + "prefixFontSize": "50%", + "rangeMaps": [ + { + "from": "null", + "text": "N/A", + "to": "null" + } + ], + "span": 4, + "sparkline": { + "fillColor": "rgba(31, 118, 189, 0.18)", + "full": false, + "lineColor": "rgb(31, 120, 193)", + "show": false + }, + "tableColumn": "", + "targets": [ + { + "expr": "sum (rate (container_cpu_usage_seconds_total{id=\"/\",kubernetes_io_hostname=~\"^$Node$\"}[1m])) / sum (machine_cpu_cores{kubernetes_io_hostname=~\"^$Node$\"}) * 100", + "interval": "10s", + "intervalFactor": 1, + "refId": "A", + "step": 10 + } + ], + "thresholds": "65, 90", + "title": "Cluster CPU usage (1m avg)", + "type": "singlestat", + "valueFontSize": "80%", + "valueMaps": [ + { + "op": "=", + "text": "N/A", + "value": "null" + } + ], + "valueName": "current" + }, + { + "cacheTimeout": null, + "colorBackground": false, + "colorValue": true, + "colors": [ + "rgba(50, 172, 45, 0.97)", + "rgba(237, 129, 40, 0.89)", + "rgba(245, 54, 54, 0.9)" + ], + "datasource": "${DS_GBBPROMETHEUS}", + "decimals": 2, + "editable": true, + "error": false, + "format": "percent", + "gauge": { + "maxValue": 100, + "minValue": 0, + "show": true, + "thresholdLabels": false, + "thresholdMarkers": true + }, + "height": "180px", + "id": 7, + "interval": null, + "links": [], + "mappingType": 1, + "mappingTypes": [ + { + "name": "value to text", + "value": 1 + }, + { + "name": "range to text", + "value": 2 + } + ], + "maxDataPoints": 100, + "nullPointMode": "connected", + "nullText": null, + "postfix": "", + "postfixFontSize": "50%", + "prefix": "", + "prefixFontSize": "50%", + "rangeMaps": [ + { + "from": "null", + "text": "N/A", + "to": "null" + } + ], + "span": 4, + "sparkline": { + "fillColor": "rgba(31, 118, 189, 0.18)", + "full": false, + "lineColor": "rgb(31, 120, 193)", + "show": false + }, + "tableColumn": "", + "targets": [ + { + "expr": "sum (container_fs_usage_bytes{device=~\"^/dev/.*$\",id=\"/\",kubernetes_io_hostname=~\"^$Node$\"}) / sum (container_fs_limit_bytes{device=~\"^/dev/.*$\",id=\"/\",kubernetes_io_hostname=~\"^$Node$\"}) * 100", + "interval": "10s", + "intervalFactor": 1, + "legendFormat": "", + "metric": "", + "refId": "A", + "step": 10 + } + ], + "thresholds": "65, 90", + "title": "Cluster filesystem usage", + "type": "singlestat", + "valueFontSize": "80%", + "valueMaps": [ + { + "op": "=", + "text": "N/A", + "value": "null" + } + ], + "valueName": "current" + }, + { + "cacheTimeout": null, + "colorBackground": false, + "colorValue": false, + "colors": [ + "rgba(50, 172, 45, 0.97)", + "rgba(237, 129, 40, 0.89)", + "rgba(245, 54, 54, 0.9)" + ], + "datasource": "${DS_GBBPROMETHEUS}", + "decimals": 2, + "editable": true, + "error": false, + "format": "bytes", + "gauge": { + "maxValue": 100, + "minValue": 0, + "show": false, + "thresholdLabels": false, + "thresholdMarkers": true + }, + "height": "1px", + "id": 9, + "interval": null, + "links": [], + "mappingType": 1, + "mappingTypes": [ + { + "name": "value to text", + "value": 1 + }, + { + "name": "range to text", + "value": 2 + } + ], + "maxDataPoints": 100, + "nullPointMode": "connected", + "nullText": null, + "postfix": "", + "postfixFontSize": "20%", + "prefix": "", + "prefixFontSize": "20%", + "rangeMaps": [ + { + "from": "null", + "text": "N/A", + "to": "null" + } + ], + "span": 2, + "sparkline": { + "fillColor": "rgba(31, 118, 189, 0.18)", + "full": false, + "lineColor": "rgb(31, 120, 193)", + "show": false + }, + "tableColumn": "", + "targets": [ + { + "expr": "sum (container_memory_working_set_bytes{id=\"/\",kubernetes_io_hostname=~\"^$Node$\"})", + "interval": "10s", + "intervalFactor": 1, + "refId": "A", + "step": 10 + } + ], + "thresholds": "", + "title": "Used", + "type": "singlestat", + "valueFontSize": "50%", + "valueMaps": [ + { + "op": "=", + "text": "N/A", + "value": "null" + } + ], + "valueName": "current" + }, + { + "cacheTimeout": null, + "colorBackground": false, + "colorValue": false, + "colors": [ + "rgba(50, 172, 45, 0.97)", + "rgba(237, 129, 40, 0.89)", + "rgba(245, 54, 54, 0.9)" + ], + "datasource": "${DS_GBBPROMETHEUS}", + "decimals": 2, + "editable": true, + "error": false, + "format": "bytes", + "gauge": { + "maxValue": 100, + "minValue": 0, + "show": false, + "thresholdLabels": false, + "thresholdMarkers": true + }, + "height": "1px", + "id": 10, + "interval": null, + "links": [], + "mappingType": 1, + "mappingTypes": [ + { + "name": "value to text", + "value": 1 + }, + { + "name": "range to text", + "value": 2 + } + ], + "maxDataPoints": 100, + "nullPointMode": "connected", + "nullText": null, + "postfix": "", + "postfixFontSize": "50%", + "prefix": "", + "prefixFontSize": "50%", + "rangeMaps": [ + { + "from": "null", + "text": "N/A", + "to": "null" + } + ], + "span": 2, + "sparkline": { + "fillColor": "rgba(31, 118, 189, 0.18)", + "full": false, + "lineColor": "rgb(31, 120, 193)", + "show": false + }, + "tableColumn": "", + "targets": [ + { + "expr": "sum (machine_memory_bytes{kubernetes_io_hostname=~\"^$Node$\"})", + "interval": "10s", + "intervalFactor": 1, + "refId": "A", + "step": 10 + } + ], + "thresholds": "", + "title": "Total", + "type": "singlestat", + "valueFontSize": "50%", + "valueMaps": [ + { + "op": "=", + "text": "N/A", + "value": "null" + } + ], + "valueName": "current" + }, + { + "cacheTimeout": null, + "colorBackground": false, + "colorValue": false, + "colors": [ + "rgba(50, 172, 45, 0.97)", + "rgba(237, 129, 40, 0.89)", + "rgba(245, 54, 54, 0.9)" + ], + "datasource": "${DS_GBBPROMETHEUS}", + "decimals": 2, + "editable": true, + "error": false, + "format": "none", + "gauge": { + "maxValue": 100, + "minValue": 0, + "show": false, + "thresholdLabels": false, + "thresholdMarkers": true + }, + "height": "1px", + "id": 11, + "interval": null, + "links": [], + "mappingType": 1, + "mappingTypes": [ + { + "name": "value to text", + "value": 1 + }, + { + "name": "range to text", + "value": 2 + } + ], + "maxDataPoints": 100, + "nullPointMode": "connected", + "nullText": null, + "postfix": " cores", + "postfixFontSize": "30%", + "prefix": "", + "prefixFontSize": "50%", + "rangeMaps": [ + { + "from": "null", + "text": "N/A", + "to": "null" + } + ], + "span": 2, + "sparkline": { + "fillColor": "rgba(31, 118, 189, 0.18)", + "full": false, + "lineColor": "rgb(31, 120, 193)", + "show": false + }, + "tableColumn": "", + "targets": [ + { + "expr": "sum (rate (container_cpu_usage_seconds_total{id=\"/\",kubernetes_io_hostname=~\"^$Node$\"}[1m]))", + "interval": "10s", + "intervalFactor": 1, + "refId": "A", + "step": 10 + } + ], + "thresholds": "", + "title": "Used", + "type": "singlestat", + "valueFontSize": "50%", + "valueMaps": [ + { + "op": "=", + "text": "N/A", + "value": "null" + } + ], + "valueName": "current" + }, + { + "cacheTimeout": null, + "colorBackground": false, + "colorValue": false, + "colors": [ + "rgba(50, 172, 45, 0.97)", + "rgba(237, 129, 40, 0.89)", + "rgba(245, 54, 54, 0.9)" + ], + "datasource": "${DS_GBBPROMETHEUS}", + "decimals": 2, + "editable": true, + "error": false, + "format": "none", + "gauge": { + "maxValue": 100, + "minValue": 0, + "show": false, + "thresholdLabels": false, + "thresholdMarkers": true + }, + "height": "1px", + "id": 12, + "interval": null, + "links": [], + "mappingType": 1, + "mappingTypes": [ + { + "name": "value to text", + "value": 1 + }, + { + "name": "range to text", + "value": 2 + } + ], + "maxDataPoints": 100, + "nullPointMode": "connected", + "nullText": null, + "postfix": " cores", + "postfixFontSize": "30%", + "prefix": "", + "prefixFontSize": "50%", + "rangeMaps": [ + { + "from": "null", + "text": "N/A", + "to": "null" + } + ], + "span": 2, + "sparkline": { + "fillColor": "rgba(31, 118, 189, 0.18)", + "full": false, + "lineColor": "rgb(31, 120, 193)", + "show": false + }, + "tableColumn": "", + "targets": [ + { + "expr": "sum (machine_cpu_cores{kubernetes_io_hostname=~\"^$Node$\"})", + "interval": "10s", + "intervalFactor": 1, + "refId": "A", + "step": 10 + } + ], + "thresholds": "", + "title": "Total", + "type": "singlestat", + "valueFontSize": "50%", + "valueMaps": [ + { + "op": "=", + "text": "N/A", + "value": "null" + } + ], + "valueName": "current" + }, + { + "cacheTimeout": null, + "colorBackground": false, + "colorValue": false, + "colors": [ + "rgba(50, 172, 45, 0.97)", + "rgba(237, 129, 40, 0.89)", + "rgba(245, 54, 54, 0.9)" + ], + "datasource": "${DS_GBBPROMETHEUS}", + "decimals": 2, + "editable": true, + "error": false, + "format": "bytes", + "gauge": { + "maxValue": 100, + "minValue": 0, + "show": false, + "thresholdLabels": false, + "thresholdMarkers": true + }, + "height": "1px", + "id": 13, + "interval": null, + "links": [], + "mappingType": 1, + "mappingTypes": [ + { + "name": "value to text", + "value": 1 + }, + { + "name": "range to text", + "value": 2 + } + ], + "maxDataPoints": 100, + "nullPointMode": "connected", + "nullText": null, + "postfix": "", + "postfixFontSize": "50%", + "prefix": "", + "prefixFontSize": "50%", + "rangeMaps": [ + { + "from": "null", + "text": "N/A", + "to": "null" + } + ], + "span": 2, + "sparkline": { + "fillColor": "rgba(31, 118, 189, 0.18)", + "full": false, + "lineColor": "rgb(31, 120, 193)", + "show": false + }, + "tableColumn": "", + "targets": [ + { + "expr": "sum (container_fs_usage_bytes{device=~\"^/dev/.*$\",id=\"/\",kubernetes_io_hostname=~\"^$Node$\"})", + "interval": "10s", + "intervalFactor": 1, + "refId": "A", + "step": 10 + } + ], + "thresholds": "", + "title": "Used", + "type": "singlestat", + "valueFontSize": "50%", + "valueMaps": [ + { + "op": "=", + "text": "N/A", + "value": "null" + } + ], + "valueName": "current" + }, + { + "cacheTimeout": null, + "colorBackground": false, + "colorValue": false, + "colors": [ + "rgba(50, 172, 45, 0.97)", + "rgba(237, 129, 40, 0.89)", + "rgba(245, 54, 54, 0.9)" + ], + "datasource": "${DS_GBBPROMETHEUS}", + "decimals": 2, + "editable": true, + "error": false, + "format": "bytes", + "gauge": { + "maxValue": 100, + "minValue": 0, + "show": false, + "thresholdLabels": false, + "thresholdMarkers": true + }, + "height": "1px", + "id": 14, + "interval": null, + "links": [], + "mappingType": 1, + "mappingTypes": [ + { + "name": "value to text", + "value": 1 + }, + { + "name": "range to text", + "value": 2 + } + ], + "maxDataPoints": 100, + "nullPointMode": "connected", + "nullText": null, + "postfix": "", + "postfixFontSize": "50%", + "prefix": "", + "prefixFontSize": "50%", + "rangeMaps": [ + { + "from": "null", + "text": "N/A", + "to": "null" + } + ], + "span": 2, + "sparkline": { + "fillColor": "rgba(31, 118, 189, 0.18)", + "full": false, + "lineColor": "rgb(31, 120, 193)", + "show": false + }, + "tableColumn": "", + "targets": [ + { + "expr": "sum (container_fs_limit_bytes{device=~\"^/dev/.*$\",id=\"/\",kubernetes_io_hostname=~\"^$Node$\"})", + "interval": "10s", + "intervalFactor": 1, + "refId": "A", + "step": 10 + } + ], + "thresholds": "", + "title": "Total", + "type": "singlestat", + "valueFontSize": "50%", + "valueMaps": [ + { + "op": "=", + "text": "N/A", + "value": "null" + } + ], + "valueName": "current" + } + ], + "repeat": null, + "repeatIteration": null, + "repeatRowId": null, + "showTitle": false, + "title": "Total usage", + "titleSize": "h6" + }, + { + "collapse": false, + "height": 250, + "panels": [ + { + "aliasColors": {}, + "bars": false, + "dashLength": 10, + "dashes": false, + "datasource": "${DS_GBBPROMETHEUS}", + "fill": 3, + "id": 33, + "legend": { + "alignAsTable": true, + "avg": false, + "current": true, + "max": true, + "min": true, + "show": true, + "total": false, + "values": true + }, + "lines": true, + "linewidth": 2, + "links": [], + "nullPointMode": "null as zero", + "percentage": false, + "pointradius": 5, + "points": false, + "renderer": "flot", + "seriesOverrides": [], + "spaceLength": 10, + "span": 12, + "stack": false, + "steppedLine": false, + "targets": [ + { + "expr": "sum (kube_pod_status_phase{}) by (phase)", + "format": "time_series", + "hide": false, + "interval": "", + "intervalFactor": 2, + "legendFormat": "{{ phase }}", + "metric": "kube_pod_status_phase", + "refId": "A", + "step": 10 + }, + { + "expr": "kubelet_running_pod_count{kubernetes_io_role =~ \".*node.*\"}", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "{{ instance }}", + "refId": "B", + "step": 10 + } + ], + "thresholds": [], + "timeFrom": null, + "timeShift": null, + "title": "Pods Running Count", + "tooltip": { + "shared": true, + "sort": 0, + "value_type": "individual" + }, + "type": "graph", + "xaxis": { + "buckets": null, + "mode": "time", + "name": null, + "show": true, + "values": [] + }, + "yaxes": [ + { + "format": "short", + "label": null, + "logBase": 1, + "max": null, + "min": "0", + "show": true + }, + { + "format": "short", + "label": null, + "logBase": 1, + "max": null, + "min": null, + "show": true + } + ] + } + ], + "repeat": null, + "repeatIteration": null, + "repeatRowId": null, + "showTitle": false, + "title": "Dashboard Row", + "titleSize": "h6" + }, + { + "collapse": true, + "height": "200px", + "panels": [ + { + "aliasColors": {}, + "bars": false, + "dashLength": 10, + "dashes": false, + "datasource": "${DS_GBBPROMETHEUS}", + "decimals": 2, + "editable": true, + "error": false, + "fill": 1, + "grid": {}, + "height": "200px", + "id": 32, + "legend": { + "alignAsTable": false, + "avg": true, + "current": true, + "max": false, + "min": false, + "rightSide": false, + "show": false, + "sideWidth": 200, + "sort": "current", + "sortDesc": true, + "total": false, + "values": true + }, + "lines": true, + "linewidth": 2, + "links": [], + "nullPointMode": "connected", + "percentage": false, + "pointradius": 5, + "points": false, + "renderer": "flot", + "seriesOverrides": [], + "spaceLength": 10, + "span": 12, + "stack": false, + "steppedLine": false, + "targets": [ + { + "expr": "sum (rate (container_network_receive_bytes_total{kubernetes_io_hostname=~\"^$Node$\"}[1m]))", + "interval": "10s", + "intervalFactor": 1, + "legendFormat": "Received", + "metric": "network", + "refId": "A", + "step": 10 + }, + { + "expr": "- sum (rate (container_network_transmit_bytes_total{kubernetes_io_hostname=~\"^$Node$\"}[1m]))", + "interval": "10s", + "intervalFactor": 1, + "legendFormat": "Sent", + "metric": "network", + "refId": "B", + "step": 10 + } + ], + "thresholds": [], + "timeFrom": null, + "timeShift": null, + "title": "Network I/O pressure", + "tooltip": { + "msResolution": false, + "shared": true, + "sort": 0, + "value_type": "cumulative" + }, + "transparent": false, + "type": "graph", + "xaxis": { + "buckets": null, + "mode": "time", + "name": null, + "show": true, + "values": [] + }, + "yaxes": [ + { + "format": "Bps", + "label": null, + "logBase": 1, + "max": null, + "min": null, + "show": true + }, + { + "format": "Bps", + "label": null, + "logBase": 1, + "max": null, + "min": null, + "show": false + } + ] + } + ], + "repeat": null, + "repeatIteration": null, + "repeatRowId": null, + "showTitle": false, + "title": "Network I/O pressure", + "titleSize": "h6" + }, + { + "collapse": true, + "height": "250px", + "panels": [ + { + "aliasColors": {}, + "bars": false, + "dashLength": 10, + "dashes": false, + "datasource": "${DS_GBBPROMETHEUS}", + "decimals": 3, + "editable": true, + "error": false, + "fill": 0, + "grid": {}, + "height": "", + "id": 17, + "legend": { + "alignAsTable": true, + "avg": true, + "current": true, + "max": false, + "min": false, + "rightSide": true, + "show": true, + "sort": "current", + "sortDesc": true, + "total": false, + "values": true + }, + "lines": true, + "linewidth": 2, + "links": [], + "nullPointMode": "connected", + "percentage": false, + "pointradius": 5, + "points": false, + "renderer": "flot", + "seriesOverrides": [], + "spaceLength": 10, + "span": 12, + "stack": false, + "steppedLine": true, + "targets": [ + { + "expr": "sum (rate (container_cpu_usage_seconds_total{image!=\"\",name=~\"^k8s_.*\",kubernetes_io_hostname=~\"^$Node$\"}[1m])) by (pod_name)", + "interval": "10s", + "intervalFactor": 1, + "legendFormat": "{{ pod_name }}", + "metric": "container_cpu", + "refId": "A", + "step": 10 + } + ], + "thresholds": [], + "timeFrom": null, + "timeShift": null, + "title": "Pods CPU usage (1m avg)", + "tooltip": { + "msResolution": true, + "shared": true, + "sort": 2, + "value_type": "cumulative" + }, + "transparent": false, + "type": "graph", + "xaxis": { + "buckets": null, + "mode": "time", + "name": null, + "show": true, + "values": [] + }, + "yaxes": [ + { + "format": "none", + "label": "cores", + "logBase": 1, + "max": null, + "min": null, + "show": true + }, + { + "format": "short", + "label": null, + "logBase": 1, + "max": null, + "min": null, + "show": false + } + ] + } + ], + "repeat": null, + "repeatIteration": null, + "repeatRowId": null, + "showTitle": false, + "title": "Pods CPU usage", + "titleSize": "h6" + }, + { + "collapse": true, + "height": "250px", + "panels": [ + { + "aliasColors": {}, + "bars": false, + "dashLength": 10, + "dashes": false, + "datasource": "${DS_GBBPROMETHEUS}", + "decimals": 2, + "editable": true, + "error": false, + "fill": 0, + "grid": {}, + "id": 25, + "legend": { + "alignAsTable": true, + "avg": true, + "current": true, + "max": false, + "min": false, + "rightSide": true, + "show": true, + "sideWidth": 200, + "sort": "current", + "sortDesc": true, + "total": false, + "values": true + }, + "lines": true, + "linewidth": 2, + "links": [], + "nullPointMode": "connected", + "percentage": false, + "pointradius": 5, + "points": false, + "renderer": "flot", + "seriesOverrides": [], + "spaceLength": 10, + "span": 12, + "stack": false, + "steppedLine": true, + "targets": [ + { + "expr": "sum (container_memory_working_set_bytes{image!=\"\",name=~\"^k8s_.*\",kubernetes_io_hostname=~\"^$Node$\"}) by (pod_name)", + "interval": "10s", + "intervalFactor": 1, + "legendFormat": "{{ pod_name }}", + "metric": "container_memory_usage:sort_desc", + "refId": "A", + "step": 10 + } + ], + "thresholds": [], + "timeFrom": null, + "timeShift": null, + "title": "Pods memory usage", + "tooltip": { + "msResolution": false, + "shared": true, + "sort": 2, + "value_type": "cumulative" + }, + "type": "graph", + "xaxis": { + "buckets": null, + "mode": "time", + "name": null, + "show": true, + "values": [] + }, + "yaxes": [ + { + "format": "bytes", + "label": null, + "logBase": 1, + "max": null, + "min": null, + "show": true + }, + { + "format": "short", + "label": null, + "logBase": 1, + "max": null, + "min": null, + "show": false + } + ] + } + ], + "repeat": null, + "repeatIteration": null, + "repeatRowId": null, + "showTitle": false, + "title": "Pods memory usage", + "titleSize": "h6" + } + ], + "schemaVersion": 14, + "style": "dark", + "tags": [ + "kubernetes" + ], + "templating": { + "list": [ + { + "allValue": ".*", + "current": {}, + "datasource": "${DS_GBBPROMETHEUS}", + "hide": 0, + "includeAll": true, + "label": null, + "multi": false, + "name": "Node", + "options": [], + "query": "label_values(kubernetes_io_hostname)", + "refresh": 1, + "regex": "", + "sort": 0, + "tagValuesQuery": "", + "tags": [], + "tagsQuery": "", + "type": "query", + "useTags": false + } + ] + }, + "time": { + "from": "now-5m", + "to": "now" + }, + "timepicker": { + "refresh_intervals": [ + "5s", + "10s", + "30s", + "1m", + "5m", + "15m", + "30m", + "1h", + "2h", + "1d" + ], + "time_options": [ + "5m", + "15m", + "1h", + "6h", + "12h", + "24h", + "2d", + "7d", + "30d" + ] + }, + "timezone": "browser", + "title": "GBB Hackfest Dashboard (Monitoring with Prometheus & Grafana)", + "version": 1 +} \ No newline at end of file diff --git a/labs/helper-files/heroes-db.yaml b/labs/helper-files/heroes-db.yaml new file mode 100644 index 0000000..40fa74d --- /dev/null +++ b/labs/helper-files/heroes-db.yaml @@ -0,0 +1,52 @@ +apiVersion: v1 +kind: Service +metadata: + name: mongodb + labels: + name: mongodb +spec: + type: ClusterIP + ports: + - name: http + port: 27017 + targetPort: 27017 + selector: + name: heroes-db +--- +apiVersion: extensions/v1beta1 +kind: Deployment +metadata: + name: heroes-db-deploy + labels: + name: heroes-db +spec: + strategy: + rollingUpdate: + maxSurge: 1 + maxUnavailable: 1 + type: RollingUpdate + template: + metadata: + labels: + name: heroes-db + spec: + imagePullSecrets: + - name: acr-secret + containers: + - image: /azureworkshop/rating-db:v1 + name: heroes-db-cntnr + resources: + requests: + cpu: "20m" + memory: "55M" + ports: + - containerPort: 27017 + name: heroes-db + volumeMounts: + - mountPath: /data + name: data + imagePullPolicy: Always + volumes: + - name: data + emptyDir: {} + restartPolicy: Always \ No newline at end of file diff --git a/labs/helper-files/heroes-web-api.yaml b/labs/helper-files/heroes-web-api.yaml new file mode 100644 index 0000000..efcf620 --- /dev/null +++ b/labs/helper-files/heroes-web-api.yaml @@ -0,0 +1,113 @@ +apiVersion: v1 +kind: Service +metadata: + name: api + labels: + name: api +spec: + type: LoadBalancer + ports: + - name: http + port: 3000 + targetPort: 3000 + selector: + name: heroes-api +--- +apiVersion: extensions/v1beta1 +kind: Deployment +metadata: + name: heroes-api-deploy + labels: + name: heroes-api +spec: + replicas: 1 + strategy: + rollingUpdate: + maxSurge: 1 + maxUnavailable: 1 + type: RollingUpdate + template: + metadata: + labels: + name: heroes-api + spec: + imagePullSecrets: + - name: acr-secret + containers: + - image: /azureworkshop/rating-api:v1 + name: heroes-api-cntnr + resources: + requests: + cpu: "20m" + memory: "55M" + env: + - name: MONGODB_URI + value: mongodb://mongodb:27017/webratings + ports: + - containerPort: 3000 + name: heroes-api + imagePullPolicy: Always + restartPolicy: Always +--- +apiVersion: v1 +kind: Service +metadata: + name: web + labels: + name: web +spec: + type: LoadBalancer + ports: + - name: http + port: 8080 + targetPort: 8080 + selector: + name: heroes-web +--- +apiVersion: extensions/v1beta1 +kind: Deployment +metadata: + name: heroes-web-deploy + labels: + name: heroes-web +spec: + replicas: 1 + strategy: + rollingUpdate: + maxSurge: 1 + maxUnavailable: 1 + type: RollingUpdate + template: + metadata: + labels: + name: heroes-web + spec: + imagePullSecrets: + - name: acr-secret + containers: + - image: /azureworkshop/rating-web:v1 + name: heroes-web-cntnr + resources: + requests: + cpu: "0.5" + memory: "1Gi" + env: + - name: API + value: http://api:3000/ + - name: KUBE_NODE_NAME + valueFrom: + fieldRef: + fieldPath: spec.nodeName + - name: KUBE_POD_NAME + valueFrom: + fieldRef: + fieldPath: metadata.name + - name: KUBE_POD_IP + valueFrom: + fieldRef: + fieldPath: status.podIP + ports: + - containerPort: 8080 + name: heroes-web + imagePullPolicy: Always + restartPolicy: Always \ No newline at end of file diff --git a/labs/helper-files/jumpbox-setup.md b/labs/helper-files/jumpbox-setup.md new file mode 100644 index 0000000..3cbc32e --- /dev/null +++ b/labs/helper-files/jumpbox-setup.md @@ -0,0 +1,38 @@ +## Jumpbox updates + +Internal use only + +## Install Mongo + +* Terminal: `sudo vi /etc/yum.repos.d/mongodb-org.repo` + +* Add the following: + +``` +[mongodb-org-3.6] +name=MongoDB Repository +baseurl=https://repo.mongodb.org/yum/redhat/$releasever/mongodb-org/3.6/x86_64/ +gpgcheck=1 +enabled=1 +gpgkey=https://www.mongodb.org/static/pgp/server-3.6.asc +``` + +* Run `sudo yum install mongodb-org` + +* Change port to 27019 in mongo config. `sudo vi /etc/mongod.conf` + +* Run `sudo systemctl start mongod` + +* Test connection `mongo localhost:27019` + +## Update tools (eg AZ CLI) + +Avoid version 2.4 since it has a bug. Use version 2.3 +`sudo yum install azure-cli-2.0.23-1.el7` + +## Clean up Docker + +``` +docker rm -f $(docker ps -a -q) +docker rmi -f $(docker images) +``` diff --git a/labs/helper-files/prometheus-configforhelm.yaml b/labs/helper-files/prometheus-configforhelm.yaml new file mode 100644 index 0000000..2bb1fcb --- /dev/null +++ b/labs/helper-files/prometheus-configforhelm.yaml @@ -0,0 +1,858 @@ +rbac: + create: false + +alertmanager: + ## If false, alertmanager will not be installed + ## + enabled: false + + # Defines the serviceAccountName to use when `rbac.create=false` + serviceAccountName: default + + ## alertmanager container name + ## + name: alertmanager + + ## alertmanager container image + ## + image: + repository: prom/alertmanager + tag: v0.9.1 + pullPolicy: IfNotPresent + + ## Additional alertmanager container arguments + ## + extraArgs: {} + + ## The URL prefix at which the container can be accessed. Useful in the case the '-web.external-url' includes a slug + ## so that the various internal URLs are still able to access as they are in the default case. + ## (Optional) + prefixURL: "" + + ## External URL which can access alertmanager + ## Maybe same with Ingress host name + baseURL: "" + + ## Additional alertmanager container environment variable + ## For instance to add a http_proxy + ## + extraEnv: {} + + ## ConfigMap override where fullname is {{.Release.Name}}-{{.Values.alertmanager.configMapOverrideName}} + ## Defining configMapOverrideName will cause templates/alertmanager-configmap.yaml + ## to NOT generate a ConfigMap resource + ## + configMapOverrideName: "" + + ingress: + ## If true, alertmanager Ingress will be created + ## + enabled: false + + ## alertmanager Ingress annotations + ## + annotations: {} + # kubernetes.io/ingress.class: nginx + # kubernetes.io/tls-acme: 'true' + + ## alertmanager Ingress hostnames + ## Must be provided if Ingress is enabled + ## + hosts: [] + # - alertmanager.domain.com + + ## alertmanager Ingress TLS configuration + ## Secrets must be manually created in the namespace + ## + tls: [] + # - secretName: prometheus-alerts-tls + # hosts: + # - alertmanager.domain.com + + ## Alertmanager Deployment Strategy type + # strategy: + # type: Recreate + + ## Node labels for alertmanager pod assignment + ## Ref: https://kubernetes.io/docs/user-guide/node-selection/ + ## + nodeSelector: {} + + persistentVolume: + ## If true, alertmanager will create/use a Persistent Volume Claim + ## If false, use emptyDir + ## + enabled: true + + ## alertmanager data Persistent Volume access modes + ## Must match those of existing PV or dynamic provisioner + ## Ref: http://kubernetes.io/docs/user-guide/persistent-volumes/ + ## + accessModes: + - ReadWriteOnce + + ## alertmanager data Persistent Volume Claim annotations + ## + annotations: {} + + ## alertmanager data Persistent Volume existing claim name + ## Requires alertmanager.persistentVolume.enabled: true + ## If defined, PVC must be created manually before volume will be bound + existingClaim: "" + + ## alertmanager data Persistent Volume mount root path + ## + mountPath: /data + + ## alertmanager data Persistent Volume size + ## + size: 2Gi + + ## alertmanager data Persistent Volume Storage Class + ## If defined, storageClassName: + ## If set to "-", storageClassName: "", which disables dynamic provisioning + ## If undefined (the default) or set to null, no storageClassName spec is + ## set, choosing the default provisioner. (gp2 on AWS, standard on + ## GKE, AWS & OpenStack) + ## + # storageClass: "-" + + ## Subdirectory of alertmanager data Persistent Volume to mount + ## Useful if the volume's root directory is not empty + ## + subPath: "" + + ## Annotations to be added to alertmanager pods + ## + podAnnotations: {} + + replicaCount: 1 + + ## alertmanager resource requests and limits + ## Ref: http://kubernetes.io/docs/user-guide/compute-resources/ + ## + resources: {} + # limits: + # cpu: 10m + # memory: 32Mi + # requests: + # cpu: 10m + # memory: 32Mi + + service: + annotations: {} + labels: {} + clusterIP: "" + + ## List of IP addresses at which the alertmanager service is available + ## Ref: https://kubernetes.io/docs/user-guide/services/#external-ips + ## + externalIPs: [] + + loadBalancerIP: "" + loadBalancerSourceRanges: [] + servicePort: 80 + # nodePort: 30000 + type: ClusterIP + +## Monitors ConfigMap changes and POSTs to a URL +## Ref: https://github.com/jimmidyson/configmap-reload +## +configmapReload: + ## configmap-reload container name + ## + name: configmap-reload + + ## configmap-reload container image + ## + image: + repository: jimmidyson/configmap-reload + tag: v0.1 + pullPolicy: IfNotPresent + + ## configmap-reload resource requests and limits + ## Ref: http://kubernetes.io/docs/user-guide/compute-resources/ + ## + resources: {} + +kubeStateMetrics: + ## If false, kube-state-metrics will not be installed + ## + enabled: true + + # Defines the serviceAccountName to use when `rbac.create=false` + serviceAccountName: default + + ## kube-state-metrics container name + ## + name: kube-state-metrics + + ## kube-state-metrics container image + ## + image: + repository: k8s.gcr.io/kube-state-metrics + tag: v1.1.0-rc.0 + pullPolicy: IfNotPresent + + ## kube-state-metrics container arguments + ## + args: {} + + ## Node labels for kube-state-metrics pod assignment + ## Ref: https://kubernetes.io/docs/user-guide/node-selection/ + ## + nodeSelector: {} + + ## Annotations to be added to kube-state-metrics pods + ## + podAnnotations: {} + + replicaCount: 1 + + ## kube-state-metrics resource requests and limits + ## Ref: http://kubernetes.io/docs/user-guide/compute-resources/ + ## + resources: {} + # limits: + # cpu: 10m + # memory: 16Mi + # requests: + # cpu: 10m + # memory: 16Mi + + service: + annotations: + prometheus.io/scrape: "true" + labels: {} + + clusterIP: None + + ## List of IP addresses at which the kube-state-metrics service is available + ## Ref: https://kubernetes.io/docs/user-guide/services/#external-ips + ## + externalIPs: [] + + loadBalancerIP: "" + loadBalancerSourceRanges: [] + servicePort: 80 + type: ClusterIP + +nodeExporter: + ## If false, node-exporter will not be installed + ## + enabled: true + + # Defines the serviceAccountName to use when `rbac.create=false` + serviceAccountName: default + + ## node-exporter container name + ## + name: node-exporter + + ## node-exporter container image + ## + image: + repository: prom/node-exporter + tag: v0.15.0 + pullPolicy: IfNotPresent + + ## Custom Update Strategy + ## + updateStrategy: + type: OnDelete + + ## Additional node-exporter container arguments + ## + extraArgs: {} + + ## Additional node-exporter hostPath mounts + ## + extraHostPathMounts: [] + # - name: textfile-dir + # mountPath: /srv/txt_collector + # hostPath: /var/lib/node-exporter + # readOnly: true + + ## Node tolerations for node-exporter scheduling to nodes with taints + ## Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/ + ## + tolerations: [] + # - key: "key" + # operator: "Equal|Exists" + # value: "value" + # effect: "NoSchedule|PreferNoSchedule|NoExecute(1.6 only)" + + ## Node labels for node-exporter pod assignment + ## Ref: https://kubernetes.io/docs/user-guide/node-selection/ + ## + nodeSelector: {} + + ## Annotations to be added to node-exporter pods + ## + podAnnotations: {} + + ## node-exporter resource limits & requests + ## Ref: https://kubernetes.io/docs/user-guide/compute-resources/ + ## + resources: {} + # limits: + # cpu: 200m + # memory: 50Mi + # requests: + # cpu: 100m + # memory: 30Mi + + service: + annotations: + prometheus.io/scrape: "true" + labels: {} + + clusterIP: None + + ## List of IP addresses at which the node-exporter service is available + ## Ref: https://kubernetes.io/docs/user-guide/services/#external-ips + ## + externalIPs: [] + + hostPort: 9100 + loadBalancerIP: "" + loadBalancerSourceRanges: [] + servicePort: 9100 + type: ClusterIP + +server: + ## Prometheus server container name + ## + name: server + + # Defines the serviceAccountName to use when `rbac.create=false` + serviceAccountName: default + + ## Prometheus server container image + ## + image: + repository: prom/prometheus + tag: v1.8.2 + pullPolicy: IfNotPresent + + ## (optional) alertmanager URL + ## only used if alertmanager.enabled = false + alertmanagerURL: "" + + ## The URL prefix at which the container can be accessed. Useful in the case the '-web.external-url' includes a slug + ## so that the various internal URLs are still able to access as they are in the default case. + ## (Optional) + prefixURL: "" + + ## External URL which can access alertmanager + ## Maybe same with Ingress host name + baseURL: "" + + ## Additional Prometheus server container arguments + ## + extraArgs: {} + + ## Additional Prometheus server hostPath mounts + ## + extraHostPathMounts: [] + # - name: certs-dir + # mountPath: /etc/kubernetes/certs + # hostPath: /etc/kubernetes/certs + # readOnly: true + + ## ConfigMap override where fullname is {{.Release.Name}}-{{.Values.server.configMapOverrideName}} + ## Defining configMapOverrideName will cause templates/server-configmap.yaml + ## to NOT generate a ConfigMap resource + ## + configMapOverrideName: "" + + ingress: + ## If true, Prometheus server Ingress will be created + ## + enabled: false + + ## Prometheus server Ingress annotations + ## + annotations: {} + # kubernetes.io/ingress.class: nginx + # kubernetes.io/tls-acme: 'true' + + ## Prometheus server Ingress hostnames + ## Must be provided if Ingress is enabled + ## + hosts: [] + # - prometheus.domain.com + + ## Prometheus server Ingress TLS configuration + ## Secrets must be manually created in the namespace + ## + tls: [] + # - secretName: prometheus-server-tls + # hosts: + # - prometheus.domain.com + + ## Server Deployment Strategy type + # strategy: + # type: Recreate + + ## Node tolerations for server scheduling to nodes with taints + ## Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/ + ## + tolerations: [] + # - key: "key" + # operator: "Equal|Exists" + # value: "value" + # effect: "NoSchedule|PreferNoSchedule|NoExecute(1.6 only)" + + ## Node labels for Prometheus server pod assignment + ## Ref: https://kubernetes.io/docs/user-guide/node-selection/ + nodeSelector: {} + + persistentVolume: + ## If true, Prometheus server will create/use a Persistent Volume Claim + ## If false, use emptyDir + ## + enabled: false + + ## Prometheus server data Persistent Volume access modes + ## Must match those of existing PV or dynamic provisioner + ## Ref: http://kubernetes.io/docs/user-guide/persistent-volumes/ + ## + accessModes: + - ReadWriteOnce + + ## Prometheus server data Persistent Volume annotations + ## + annotations: {} + + ## Prometheus server data Persistent Volume existing claim name + ## Requires server.persistentVolume.enabled: true + ## If defined, PVC must be created manually before volume will be bound + existingClaim: "" + + ## Prometheus server data Persistent Volume mount root path + ## + mountPath: /data + + ## Prometheus server data Persistent Volume size + ## + size: 8Gi + + ## Prometheus server data Persistent Volume Storage Class + ## If defined, storageClassName: + ## If set to "-", storageClassName: "", which disables dynamic provisioning + ## If undefined (the default) or set to null, no storageClassName spec is + ## set, choosing the default provisioner. (gp2 on AWS, standard on + ## GKE, AWS & OpenStack) + ## + # storageClass: "-" + + ## Subdirectory of Prometheus server data Persistent Volume to mount + ## Useful if the volume's root directory is not empty + ## + subPath: "" + + ## Annotations to be added to Prometheus server pods + ## + podAnnotations: {} + # iam.amazonaws.com/role: prometheus + + replicaCount: 1 + + ## Prometheus server resource requests and limits + ## Ref: http://kubernetes.io/docs/user-guide/compute-resources/ + ## + resources: {} + # limits: + # cpu: 500m + # memory: 512Mi + # requests: + # cpu: 500m + # memory: 512Mi + + service: + annotations: {} + labels: {} + clusterIP: "" + + ## List of IP addresses at which the Prometheus server service is available + ## Ref: https://kubernetes.io/docs/user-guide/services/#external-ips + ## + externalIPs: [] + + loadBalancerIP: "" + loadBalancerSourceRanges: [] + servicePort: 9090 + type: LoadBalancer + + ## Prometheus server pod termination grace period + ## + terminationGracePeriodSeconds: 300 + + ## Prometheus data retention period (i.e 360h) + ## + retention: "" + +pushgateway: + ## If false, pushgateway will not be installed + ## + enabled: false + + ## pushgateway container name + ## + name: pushgateway + + ## pushgateway container image + ## + image: + repository: prom/pushgateway + tag: v0.4.0 + pullPolicy: IfNotPresent + + ## Additional pushgateway container arguments + ## + extraArgs: {} + + ingress: + ## If true, pushgateway Ingress will be created + ## + enabled: false + + ## pushgateway Ingress annotations + ## + annotations: + # kubernetes.io/ingress.class: nginx + # kubernetes.io/tls-acme: 'true' + + ## pushgateway Ingress hostnames + ## Must be provided if Ingress is enabled + ## + hosts: [] + # - pushgateway.domain.com + + ## pushgateway Ingress TLS configuration + ## Secrets must be manually created in the namespace + ## + tls: [] + # - secretName: prometheus-alerts-tls + # hosts: + # - pushgateway.domain.com + + ## Node labels for pushgateway pod assignment + ## Ref: https://kubernetes.io/docs/user-guide/node-selection/ + ## + nodeSelector: {} + + ## Annotations to be added to pushgateway pods + ## + podAnnotations: {} + + replicaCount: 1 + + ## pushgateway resource requests and limits + ## Ref: http://kubernetes.io/docs/user-guide/compute-resources/ + ## + resources: {} + # limits: + # cpu: 10m + # memory: 32Mi + # requests: + # cpu: 10m + # memory: 32Mi + + service: + annotations: + prometheus.io/probe: pushgateway + labels: {} + clusterIP: "" + + ## List of IP addresses at which the pushgateway service is available + ## Ref: https://kubernetes.io/docs/user-guide/services/#external-ips + ## + externalIPs: [] + + loadBalancerIP: "" + loadBalancerSourceRanges: [] + servicePort: 9091 + type: ClusterIP + +## alertmanager ConfigMap entries +## +alertmanagerFiles: + alertmanager.yml: |- + global: + # slack_api_url: '' + + receivers: + - name: default-receiver + # slack_configs: + # - channel: '@you' + # send_resolved: true + + route: + group_wait: 10s + group_interval: 5m + receiver: default-receiver + repeat_interval: 3h + +## Prometheus server ConfigMap entries +## +serverFiles: + alerts: "" + rules: "" + + prometheus.yml: |- + global: + scrape_interval: 5s + evaluation_interval: 5s + + scrape_configs: + - job_name: prometheus + static_configs: + - targets: + - localhost:9090 + + # A scrape configuration for running Prometheus on a Kubernetes cluster. + # This uses separate scrape configs for cluster components (i.e. API server, node) + # and services to allow each to use different authentication configs. + # + # Kubernetes labels will be added as Prometheus labels on metrics via the + # `labelmap` relabeling action. + + # Scrape config for API servers. + # + # Kubernetes exposes API servers as endpoints to the default/kubernetes + # service so this uses `endpoints` role and uses relabelling to only keep + # the endpoints associated with the default/kubernetes service using the + # default named port `https`. This works for single API server deployments as + # well as HA API server deployments. + - job_name: 'kubernetes-apiservers' + + kubernetes_sd_configs: + - role: endpoints + + # Default to scraping over https. If required, just disable this or change to + # `http`. + scheme: https + + # This TLS & bearer token file config is used to connect to the actual scrape + # endpoints for cluster components. This is separate to discovery auth + # configuration because discovery & scraping are two separate concerns in + # Prometheus. The discovery auth config is automatic if Prometheus runs inside + # the cluster. Otherwise, more config options have to be provided within the + # . + tls_config: + ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt + # If your node certificates are self-signed or use a different CA to the + # master CA, then disable certificate verification below. Note that + # certificate verification is an integral part of a secure infrastructure + # so this should only be disabled in a controlled environment. You can + # disable certificate verification by uncommenting the line below. + # + insecure_skip_verify: true + bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token + + # Keep only the default/kubernetes service endpoints for the https port. This + # will add targets for each API server which Kubernetes adds an endpoint to + # the default/kubernetes service. + relabel_configs: + - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name] + action: keep + regex: default;kubernetes;https + + - job_name: 'kubernetes-nodes' + + # Default to scraping over https. If required, just disable this or change to + # `http`. + scheme: https + + # This TLS & bearer token file config is used to connect to the actual scrape + # endpoints for cluster components. This is separate to discovery auth + # configuration because discovery & scraping are two separate concerns in + # Prometheus. The discovery auth config is automatic if Prometheus runs inside + # the cluster. Otherwise, more config options have to be provided within the + # . + tls_config: + ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt + # If your node certificates are self-signed or use a different CA to the + # master CA, then disable certificate verification below. Note that + # certificate verification is an integral part of a secure infrastructure + # so this should only be disabled in a controlled environment. You can + # disable certificate verification by uncommenting the line below. + # + insecure_skip_verify: true + bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token + + kubernetes_sd_configs: + - role: node + + relabel_configs: + - action: labelmap + regex: __meta_kubernetes_node_label_(.+) + - target_label: __address__ + replacement: kubernetes.default.svc:443 + - source_labels: [__meta_kubernetes_node_name] + regex: (.+) + target_label: __metrics_path__ + replacement: /api/v1/nodes/${1}/proxy/metrics + + - job_name: 'kubernetes-cadvisor' + + # Default to scraping over https. If required, just disable this or change to + # `http`. + scheme: https + + # This TLS & bearer token file config is used to connect to the actual scrape + # endpoints for cluster components. This is separate to discovery auth + # configuration because discovery & scraping are two separate concerns in + # Prometheus. The discovery auth config is automatic if Prometheus runs inside + # the cluster. Otherwise, more config options have to be provided within the + # . + tls_config: + ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt + # If your node certificates are self-signed or use a different CA to the + # master CA, then disable certificate verification below. Note that + # certificate verification is an integral part of a secure infrastructure + # so this should only be disabled in a controlled environment. You can + # disable certificate verification by uncommenting the line below. + # + insecure_skip_verify: true + bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token + + kubernetes_sd_configs: + - role: node + + relabel_configs: + - action: labelmap + regex: __meta_kubernetes_node_label_(.+) + - target_label: __address__ + replacement: kubernetes.default.svc:443 + - source_labels: [__meta_kubernetes_node_name] + regex: (.+) + target_label: __metrics_path__ + replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor + + # Scrape config for service endpoints. + # + # The relabeling allows the actual service scrape endpoint to be configured + # via the following annotations: + # + # * `prometheus.io/scrape`: Only scrape services that have a value of `true` + # * `prometheus.io/scheme`: If the metrics endpoint is secured then you will need + # to set this to `https` & most likely set the `tls_config` of the scrape config. + # * `prometheus.io/path`: If the metrics path is not `/metrics` override this. + # * `prometheus.io/port`: If the metrics are exposed on a different port to the + # service then set this appropriately. + - job_name: 'kubernetes-service-endpoints' + + kubernetes_sd_configs: + - role: endpoints + + relabel_configs: + - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape] + action: keep + regex: true + - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme] + action: replace + target_label: __scheme__ + regex: (https?) + - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path] + action: replace + target_label: __metrics_path__ + regex: (.+) + - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port] + action: replace + target_label: __address__ + regex: (.+)(?::\d+);(\d+) + replacement: ${1}:${2} + - action: labelmap + regex: __meta_kubernetes_service_label_(.+) + - source_labels: [__meta_kubernetes_namespace] + action: replace + target_label: kubernetes_namespace + - source_labels: [__meta_kubernetes_service_name] + action: replace + target_label: kubernetes_name + + - job_name: 'prometheus-pushgateway' + honor_labels: true + + kubernetes_sd_configs: + - role: service + + relabel_configs: + - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe] + action: keep + regex: pushgateway + + # Example scrape config for probing services via the Blackbox Exporter. + # + # The relabeling allows the actual service scrape endpoint to be configured + # via the following annotations: + # + # * `prometheus.io/probe`: Only probe services that have a value of `true` + - job_name: 'kubernetes-services' + + metrics_path: /probe + params: + module: [http_2xx] + + kubernetes_sd_configs: + - role: service + + relabel_configs: + - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe] + action: keep + regex: true + - source_labels: [__address__] + target_label: __param_target + - target_label: __address__ + replacement: blackbox + - source_labels: [__param_target] + target_label: instance + - action: labelmap + regex: __meta_kubernetes_service_label_(.+) + - source_labels: [__meta_kubernetes_namespace] + target_label: kubernetes_namespace + - source_labels: [__meta_kubernetes_service_name] + target_label: kubernetes_name + + # Example scrape config for pods + # + # The relabeling allows the actual pod scrape endpoint to be configured via the + # following annotations: + # + # * `prometheus.io/scrape`: Only scrape pods that have a value of `true` + # * `prometheus.io/path`: If the metrics path is not `/metrics` override this. + # * `prometheus.io/port`: Scrape the pod on the indicated port instead of the default of `9102`. + - job_name: 'kubernetes-pods' + + kubernetes_sd_configs: + - role: pod + + relabel_configs: + - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape] + action: keep + regex: true + - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path] + action: replace + target_label: __metrics_path__ + regex: (.+) + - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port] + action: replace + regex: (.+):(?:\d+);(\d+) + replacement: ${1}:${2} + target_label: __address__ + - action: labelmap + regex: __meta_kubernetes_pod_label_(.+) + - source_labels: [__meta_kubernetes_namespace] + action: replace + target_label: kubernetes_namespace + - source_labels: [__meta_kubernetes_pod_name] + action: replace + target_label: kubernetes_pod_name + +networkPolicy: + ## Enable creation of NetworkPolicy resources. + ## + enabled: false \ No newline at end of file