CNS Prometheus and Grafana examples (#1366)
* cns prometheus examples Signed-off-by: Evan Baker <rbtr@users.noreply.github.com> * grafana samples Signed-off-by: Evan Baker <rbtr@users.noreply.github.com>
This commit is contained in:
Родитель
2c77774852
Коммит
8750b346ed
|
@ -0,0 +1,66 @@
|
|||
# Azure CNS metrics
|
||||
azure-cns exposes metrics via Prometheus on `:10092/metrics`
|
||||
|
||||
## Scraping
|
||||
Prometheus can be configured using these examples:
|
||||
- a [podMonitor](podMonitor.yaml), if using promotheus-operator or kube-prometheus
|
||||
- manually via this equivalent [scrape_config](scrape_config.yaml)
|
||||
|
||||
## Monitoring
|
||||
To view all available CNS metrics once Prometheus is correctly configured to scrape:
|
||||
```promql
|
||||
count ({job="kube-system/azure-cns"}) by (__name__)
|
||||
```
|
||||
|
||||
CNS exposes standard Go and Prom metrics such as `go_goroutines`, `go_gc*`, `up`, and more.
|
||||
|
||||
Metrics designed to be customer-facing are generally prefixed with `cx_` and can be listed similarly:
|
||||
```promql
|
||||
count ({__name__=~"cx.*",job="kube-system/azure-cns"}) by (__name__)
|
||||
```
|
||||
At time of writing, the following cx metrics are exposed (key metrics in **bold**):
|
||||
- **cx_ipam_available_ips** (IPs reserved by the Node but not assigned to Pods yet)
|
||||
- cx_ipam_batch_size
|
||||
- cx_ipam_current_available_ips
|
||||
- cx_ipam_expect_available_ips
|
||||
- **cx_ipam_max_ips** (maximum IPs the Node can reserve from the Subnet)
|
||||
- cx_ipam_pending_programming_ips
|
||||
- cx_ipam_pending_release_ips
|
||||
- **cx_ipam_pod_allocated_ips** (IPs assigned to Pods on the Node)
|
||||
- cx_ipam_requested_ips
|
||||
- **cx_ipam_total_ips** (IPs reserved by the Node from the Subnet)
|
||||
|
||||
These metrics may be used to gain insight in to the current state of the cluster's IPAM.
|
||||
|
||||
For example, to view the current IP count requested by each node:
|
||||
```promql
|
||||
sum (cx_ipam_requested_ips{job="kube-system/azure-cns"}) by (instance)
|
||||
```
|
||||
To view the current IP count allocated to each node:
|
||||
```promql
|
||||
sum (cx_ipam_total_ips{job="kube-system/azure-cns"}) by (instance)
|
||||
```
|
||||
> Note: if these two values aren't converging after some time, that indicates an IP provisioning error.
|
||||
|
||||
To view the current IP count assigned to pods, per node:
|
||||
```promql
|
||||
sum (cx_ipam_pod_allocated_ips{job="kube-system/azure-cns"}) by (instance)
|
||||
```
|
||||
|
||||
## Visualizing
|
||||
A sample Grafana dashboard is included at [grafan.json](grafana.json).
|
||||
|
||||
Visualizations included are:
|
||||
- Per Node
|
||||
- CNS Status (Up/Down)
|
||||
- Requested IPs
|
||||
- Reserved IPs
|
||||
- Used IPs
|
||||
- Request/Reserved/Used vs Time
|
||||
- Per Cluster
|
||||
- Total Reserver IPs vs Time
|
||||
- Total Used IPs vs Time
|
||||
- Reserved and Assigned vs Time
|
||||
- Cluster Subnet Utilization Percentage vs Time
|
||||
- Cluster Subnet Utilization Total vs Time
|
||||
- Node Headroom (how many additional Nodes can be added to the Cluster based on the Subnet capacity)
|
Разница между файлами не показана из-за своего большого размера
Загрузить разницу
|
@ -0,0 +1,14 @@
|
|||
## This example podMonitor config can be used with a Prometheus-Operator
|
||||
## managed Prometheus to automatically discover and collect azure-cns metrics.
|
||||
---
|
||||
apiVersion: monitoring.coreos.com/v1
|
||||
kind: PodMonitor
|
||||
metadata:
|
||||
name: azure-cns
|
||||
namespace: kube-system
|
||||
spec:
|
||||
podMetricsEndpoints:
|
||||
- port: metrics
|
||||
selector:
|
||||
matchLabels:
|
||||
k8s-app: azure-cns
|
|
@ -0,0 +1,76 @@
|
|||
## This example Prometheus scrape-config can be used with a manually
|
||||
## configured Prometheus to collect azure-cns metrics.
|
||||
- job_name: azure-cns
|
||||
honor_timestamps: true
|
||||
scrape_interval: 30s
|
||||
scrape_timeout: 10s
|
||||
metrics_path: /metrics
|
||||
scheme: http
|
||||
follow_redirects: true
|
||||
enable_http2: true
|
||||
relabel_configs:
|
||||
- source_labels: [job]
|
||||
separator: ;
|
||||
regex: (.*)
|
||||
target_label: __tmp_prometheus_job_name
|
||||
replacement: $1
|
||||
action: replace
|
||||
- source_labels: [__meta_kubernetes_pod_label_k8s_app, __meta_kubernetes_pod_labelpresent_k8s_app]
|
||||
separator: ;
|
||||
regex: (azure-cns);true
|
||||
replacement: $1
|
||||
action: keep
|
||||
- source_labels: [__meta_kubernetes_pod_container_port_name]
|
||||
separator: ;
|
||||
regex: metrics
|
||||
replacement: $1
|
||||
action: keep
|
||||
- source_labels: [__meta_kubernetes_namespace]
|
||||
separator: ;
|
||||
regex: (.*)
|
||||
target_label: namespace
|
||||
replacement: $1
|
||||
action: replace
|
||||
- source_labels: [__meta_kubernetes_pod_container_name]
|
||||
separator: ;
|
||||
regex: (.*)
|
||||
target_label: container
|
||||
replacement: $1
|
||||
action: replace
|
||||
- source_labels: [__meta_kubernetes_pod_name]
|
||||
separator: ;
|
||||
regex: (.*)
|
||||
target_label: pod
|
||||
replacement: $1
|
||||
action: replace
|
||||
- separator: ;
|
||||
regex: (.*)
|
||||
target_label: job
|
||||
replacement: kube-system/azure-cns
|
||||
action: replace
|
||||
- separator: ;
|
||||
regex: (.*)
|
||||
target_label: endpoint
|
||||
replacement: metrics
|
||||
action: replace
|
||||
- source_labels: [__address__]
|
||||
separator: ;
|
||||
regex: (.*)
|
||||
modulus: 1
|
||||
target_label: __tmp_hash
|
||||
replacement: $1
|
||||
action: hashmod
|
||||
- source_labels: [__tmp_hash]
|
||||
separator: ;
|
||||
regex: "0"
|
||||
replacement: $1
|
||||
action: keep
|
||||
kubernetes_sd_configs:
|
||||
- role: pod
|
||||
kubeconfig_file: ""
|
||||
follow_redirects: true
|
||||
enable_http2: true
|
||||
namespaces:
|
||||
own_namespace: false
|
||||
names:
|
||||
- kube-system
|
Загрузка…
Ссылка в новой задаче