CNS Prometheus and Grafana examples (#1366)

* cns prometheus examples

Signed-off-by: Evan Baker <rbtr@users.noreply.github.com>

* grafana samples

Signed-off-by: Evan Baker <rbtr@users.noreply.github.com>
This commit is contained in:
Evan Baker 2022-05-11 12:34:47 -05:00 коммит произвёл GitHub
Родитель 2c77774852
Коммит 8750b346ed
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 4AEE18F83AFDEB23
4 изменённых файлов: 1248 добавлений и 0 удалений

Просмотреть файл

@ -0,0 +1,66 @@
# Azure CNS metrics
azure-cns exposes metrics via Prometheus on `:10092/metrics`
## Scraping
Prometheus can be configured using these examples:
- a [podMonitor](podMonitor.yaml), if using promotheus-operator or kube-prometheus
- manually via this equivalent [scrape_config](scrape_config.yaml)
## Monitoring
To view all available CNS metrics once Prometheus is correctly configured to scrape:
```promql
count ({job="kube-system/azure-cns"}) by (__name__)
```
CNS exposes standard Go and Prom metrics such as `go_goroutines`, `go_gc*`, `up`, and more.
Metrics designed to be customer-facing are generally prefixed with `cx_` and can be listed similarly:
```promql
count ({__name__=~"cx.*",job="kube-system/azure-cns"}) by (__name__)
```
At time of writing, the following cx metrics are exposed (key metrics in **bold**):
- **cx_ipam_available_ips** (IPs reserved by the Node but not assigned to Pods yet)
- cx_ipam_batch_size
- cx_ipam_current_available_ips
- cx_ipam_expect_available_ips
- **cx_ipam_max_ips** (maximum IPs the Node can reserve from the Subnet)
- cx_ipam_pending_programming_ips
- cx_ipam_pending_release_ips
- **cx_ipam_pod_allocated_ips** (IPs assigned to Pods on the Node)
- cx_ipam_requested_ips
- **cx_ipam_total_ips** (IPs reserved by the Node from the Subnet)
These metrics may be used to gain insight in to the current state of the cluster's IPAM.
For example, to view the current IP count requested by each node:
```promql
sum (cx_ipam_requested_ips{job="kube-system/azure-cns"}) by (instance)
```
To view the current IP count allocated to each node:
```promql
sum (cx_ipam_total_ips{job="kube-system/azure-cns"}) by (instance)
```
> Note: if these two values aren't converging after some time, that indicates an IP provisioning error.
To view the current IP count assigned to pods, per node:
```promql
sum (cx_ipam_pod_allocated_ips{job="kube-system/azure-cns"}) by (instance)
```
## Visualizing
A sample Grafana dashboard is included at [grafan.json](grafana.json).
Visualizations included are:
- Per Node
- CNS Status (Up/Down)
- Requested IPs
- Reserved IPs
- Used IPs
- Request/Reserved/Used vs Time
- Per Cluster
- Total Reserver IPs vs Time
- Total Used IPs vs Time
- Reserved and Assigned vs Time
- Cluster Subnet Utilization Percentage vs Time
- Cluster Subnet Utilization Total vs Time
- Node Headroom (how many additional Nodes can be added to the Cluster based on the Subnet capacity)

Разница между файлами не показана из-за своего большого размера Загрузить разницу

Просмотреть файл

@ -0,0 +1,14 @@
## This example podMonitor config can be used with a Prometheus-Operator
## managed Prometheus to automatically discover and collect azure-cns metrics.
---
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: azure-cns
namespace: kube-system
spec:
podMetricsEndpoints:
- port: metrics
selector:
matchLabels:
k8s-app: azure-cns

Просмотреть файл

@ -0,0 +1,76 @@
## This example Prometheus scrape-config can be used with a manually
## configured Prometheus to collect azure-cns metrics.
- job_name: azure-cns
honor_timestamps: true
scrape_interval: 30s
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
follow_redirects: true
enable_http2: true
relabel_configs:
- source_labels: [job]
separator: ;
regex: (.*)
target_label: __tmp_prometheus_job_name
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_label_k8s_app, __meta_kubernetes_pod_labelpresent_k8s_app]
separator: ;
regex: (azure-cns);true
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_pod_container_port_name]
separator: ;
regex: metrics
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_container_name]
separator: ;
regex: (.*)
target_label: container
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_name]
separator: ;
regex: (.*)
target_label: pod
replacement: $1
action: replace
- separator: ;
regex: (.*)
target_label: job
replacement: kube-system/azure-cns
action: replace
- separator: ;
regex: (.*)
target_label: endpoint
replacement: metrics
action: replace
- source_labels: [__address__]
separator: ;
regex: (.*)
modulus: 1
target_label: __tmp_hash
replacement: $1
action: hashmod
- source_labels: [__tmp_hash]
separator: ;
regex: "0"
replacement: $1
action: keep
kubernetes_sd_configs:
- role: pod
kubeconfig_file: ""
follow_redirects: true
enable_http2: true
namespaces:
own_namespace: false
names:
- kube-system