fix bug in pct calculation in memory usage alert (#900)

[comment]: # (Note that your PR title should follow the conventional
commit format: https://conventionalcommits.org/en/v1.0.0/#summary)
# PR Description
fix bug in pct calculation in memory usage alert.

I have deployed this fix for near ring clusters
[wcus](https://ms.portal.azure.com/#@microsoft.onmicrosoft.com/resource/subscriptions/9b96ebbd-c57a-42d1-bbe9-b69296e4c7fb/resourcegroups/monitoring-metrics-amw/providers/Microsoft.AlertsManagement/prometheusRuleGroups/containerinsights_monitoring-metrics-amw-wcus_alerts/Listing)
and
[euseuap](https://ms.portal.azure.com/#@microsoft.onmicrosoft.com/resource/subscriptions/9b96ebbd-c57a-42d1-bbe9-b69296e4c7fb/resourceGroups/monitoring-metrics-amw/providers/Microsoft.AlertsManagement/prometheusRuleGroups/containerinsights_monitoring-metrics-amw-eus2euap_alerts/Listing).


[comment]: # (The below checklist is for PRs adding new features. If a
box is not checked, add a reason why it's not needed.)
# New Feature Checklist

- [ ] List telemetry added about the feature.
- [ ] Link to the one-pager about the feature.
- [ ] List any tasks necessary for release (3P docs, AKS RP chart
changes, etc.) after merging the PR.
- [ ] Attach results of scale and perf testing.

[comment]: # (The below checklist is for code changes. Not all boxes
necessarily need to be checked. Build, doc, and template changes do not
need to fill out the checklist.)
# Tests Checklist

- [ ] Have end-to-end Ginkgo tests been run on your cluster and passed?
To bootstrap your cluster to run the tests, follow [these
instructions](/otelcollector/test/README.md#bootstrap-a-dev-cluster-to-run-ginkgo-tests).
  - Labels used when running the tests on your cluster:
    - [ ] `operator`
    - [ ] `windows`
    - [ ] `arm64`
    - [ ] `arc-extension`
    - [ ] `fips`
- [ ] Have new tests been added? For features, have tests been added for
this feature? For fixes, is there a test that could have caught this
issue and could validate that the fix works?
  - [ ] Is a new scrape job needed?
- [ ] The scrape job was added to the folder
[test-cluster-yamls](/otelcollector/test/test-cluster-yamls/) in the
correct configmap or as a CR.
  - [ ] Was a new test label added?
- [ ] A string constant for the label was added to
[constants.go](/otelcollector/test/utils/constants.go).
- [ ] The label and description was added to the [test
README](/otelcollector/test/README.md).
- [ ] The label was added to this [PR
checklist](/.github/pull_request_template).
- [ ] The label was added as needed to
[testkube-test-crs.yaml](/otelcollector/test/testkube/testkube-test-crs.yaml).
  - [ ] Are additional API server permissions needed for the new tests?
- [ ] These permissions have been added to
[api-server-permissions.yaml](/otelcollector/test/testkube/api-server-permissions.yaml).
  - [ ] Was a new test suite (a new folder under `/tests`) added?
- [ ] The new test suite is included in
[testkube-test-crs.yaml](/otelcollector/test/testkube/testkube-test-crs.yaml).
This commit is contained in:
Sohamdg081992 2024-05-29 15:05:29 -07:00 коммит произвёл GitHub
Родитель 0bbd50f55c
Коммит 3e72c0e41c
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: B5690EEEBB952194
2 изменённых файлов: 3 добавлений и 3 удалений

Просмотреть файл

@ -218,7 +218,7 @@
},
{
"alert": "Memory usage % greater than 75 for prometheus-collector containers on cluster ci-dev-aks-mac-eus",
"expression": "(sum(container_memory_working_set_bytes{namespace=\"kube-system\", container=\"prometheus-collector\", image!=\"\"}) by (container, pod) / sum(kube_pod_container_resource_limits{namespace=\"kube-system\", container=\"prometheus-collector\", resource=\"memory\"}) by (container, pod)) > 75",
"expression": "(sum(container_memory_working_set_bytes{namespace=\"kube-system\", container=\"prometheus-collector\", image!=\"\"}) by (container, pod) / sum(kube_pod_container_resource_limits{namespace=\"kube-system\", container=\"prometheus-collector\", resource=\"memory\"}) by (container, pod)) * 100> 75",
"for": "PT3M",
"annotations": {
"description": "Memory usage greater than 75% for prometheus-collector containers on cluster ci-dev-aks-mac-eus"

Просмотреть файл

@ -937,7 +937,7 @@
},
{
"alert": "[Concat('Memory usage greater than 75% for prometheus-collector containers on cluster', parameters('clusterName'))]",
"expression": "(sum(container_memory_working_set_bytes{namespace=\"kube-system\", container=\"prometheus-collector\", image!=\"\"}) by (container, pod) / sum(kube_pod_container_resource_limits{namespace=\"kube-system\", container=\"prometheus-collector\", resource=\"memory\"}) by (container, pod)) > 75",
"expression": "(sum(container_memory_working_set_bytes{namespace=\"kube-system\", container=\"prometheus-collector\", image!=\"\"}) by (container, pod) / sum(kube_pod_container_resource_limits{namespace=\"kube-system\", container=\"prometheus-collector\", resource=\"memory\"}) by (container, pod)) * 100> 75",
"for": "PT3M",
"annotations": {
"description": "[Concat('Memory usage greater than 75% for prometheus-collector containers on cluster', parameters('clusterName'))]"
@ -976,4 +976,4 @@
}
],
"outputs": {}
}
}