Граф коммитов

111 Коммитов

Автор SHA1 Сообщение Дата
bragi92 5e1e97d6bc
fix: update buildx to use acr image (#1011)
[comment]: # (Note that your PR title should follow the conventional
commit format: https://conventionalcommits.org/en/v1.0.0/#summary)
# PR Description

[comment]: # (The below checklist is for PRs adding new features. If a
box is not checked, add a reason why it's not needed.)
# New Feature Checklist

- [ ] List telemetry added about the feature.
- [ ] Link to the one-pager about the feature.
- [ ] List any tasks necessary for release (3P docs, AKS RP chart
changes, etc.) after merging the PR.
- [ ] Attach results of scale and perf testing.

[comment]: # (The below checklist is for code changes. Not all boxes
necessarily need to be checked. Build, doc, and template changes do not
need to fill out the checklist.)
# Tests Checklist

- [ ] Have end-to-end Ginkgo tests been run on your cluster and passed?
To bootstrap your cluster to run the tests, follow [these
instructions](/otelcollector/test/README.md#bootstrap-a-dev-cluster-to-run-ginkgo-tests).
  - Labels used when running the tests on your cluster:
    - [ ] `operator`
    - [ ] `windows`
    - [ ] `arm64`
    - [ ] `arc-extension`
    - [ ] `fips`
- [ ] Have new tests been added? For features, have tests been added for
this feature? For fixes, is there a test that could have caught this
issue and could validate that the fix works?
  - [ ] Is a new scrape job needed?
- [ ] The scrape job was added to the folder
[test-cluster-yamls](/otelcollector/test/test-cluster-yamls/) in the
correct configmap or as a CR.
  - [ ] Was a new test label added?
- [ ] A string constant for the label was added to
[constants.go](/otelcollector/test/utils/constants.go).
- [ ] The label and description was added to the [test
README](/otelcollector/test/README.md).
- [ ] The label was added to this [PR
checklist](/.github/pull_request_template).
- [ ] The label was added as needed to
[testkube-test-crs.yaml](/otelcollector/test/testkube/testkube-test-crs.yaml).
  - [ ] Are additional API server permissions needed for the new tests?
- [ ] These permissions have been added to
[api-server-permissions.yaml](/otelcollector/test/testkube/api-server-permissions.yaml).
  - [ ] Was a new test suite (a new folder under `/tests`) added?
- [ ] The new test suite is included in
[testkube-test-crs.yaml](/otelcollector/test/testkube/testkube-test-crs.yaml).
2024-11-06 18:31:35 +00:00
bragi92 226984b2b2
[feat] windows golang update (#969)
[comment]: # (Note that your PR title should follow the conventional
commit format: https://conventionalcommits.org/en/v1.0.0/#summary)
# PR Description

[comment]: # (The below checklist is for PRs adding new features. If a
box is not checked, add a reason why it's not needed.)
# New Feature Checklist

- [ ] List telemetry added about the feature.
- [ ] Link to the one-pager about the feature.
- [ ] List any tasks necessary for release (3P docs, AKS RP chart
changes, etc.) after merging the PR.
- [ ] Attach results of scale and perf testing.

[comment]: # (The below checklist is for code changes. Not all boxes
necessarily need to be checked. Build, doc, and template changes do not
need to fill out the checklist.)
# Tests Checklist

- [ ] Have end-to-end Ginkgo tests been run on your cluster and passed?
To bootstrap your cluster to run the tests, follow [these
instructions](/otelcollector/test/README.md#bootstrap-a-dev-cluster-to-run-ginkgo-tests).
  - Labels used when running the tests on your cluster:
    - [ ] `operator`
    - [ ] `windows`
    - [ ] `arm64`
    - [ ] `arc-extension`
    - [ ] `fips`
- [ ] Have new tests been added? For features, have tests been added for
this feature? For fixes, is there a test that could have caught this
issue and could validate that the fix works?
  - [ ] Is a new scrape job needed?
- [ ] The scrape job was added to the folder
[test-cluster-yamls](/otelcollector/test/test-cluster-yamls/) in the
correct configmap or as a CR.
  - [ ] Was a new test label added?
- [ ] A string constant for the label was added to
[constants.go](/otelcollector/test/utils/constants.go).
- [ ] The label and description was added to the [test
README](/otelcollector/test/README.md).
- [ ] The label was added to this [PR
checklist](/.github/pull_request_template).
- [ ] The label was added as needed to
[testkube-test-crs.yaml](/otelcollector/test/testkube/testkube-test-crs.yaml).
  - [ ] Are additional API server permissions needed for the new tests?
- [ ] These permissions have been added to
[api-server-permissions.yaml](/otelcollector/test/testkube/api-server-permissions.yaml).
  - [ ] Was a new test suite (a new folder under `/tests`) added?
- [ ] The new test suite is included in
[testkube-test-crs.yaml](/otelcollector/test/testkube/testkube-test-crs.yaml).
2024-11-04 08:30:30 -08:00
bragi92 bcd2d5c009
Sign flagged binaries for windows containers (#1001)
[comment]: # (Note that your PR title should follow the conventional
commit format: https://conventionalcommits.org/en/v1.0.0/#summary)
# PR Description

[comment]: # (The below checklist is for PRs adding new features. If a
box is not checked, add a reason why it's not needed.)
# New Feature Checklist

- [ ] List telemetry added about the feature.
- [ ] Link to the one-pager about the feature.
- [ ] List any tasks necessary for release (3P docs, AKS RP chart
changes, etc.) after merging the PR.
- [ ] Attach results of scale and perf testing.

[comment]: # (The below checklist is for code changes. Not all boxes
necessarily need to be checked. Build, doc, and template changes do not
need to fill out the checklist.)
# Tests Checklist

- [ ] Have end-to-end Ginkgo tests been run on your cluster and passed?
To bootstrap your cluster to run the tests, follow [these
instructions](/otelcollector/test/README.md#bootstrap-a-dev-cluster-to-run-ginkgo-tests).
  - Labels used when running the tests on your cluster:
    - [ ] `operator`
    - [ ] `windows`
    - [ ] `arm64`
    - [ ] `arc-extension`
    - [ ] `fips`
- [ ] Have new tests been added? For features, have tests been added for
this feature? For fixes, is there a test that could have caught this
issue and could validate that the fix works?
  - [ ] Is a new scrape job needed?
- [ ] The scrape job was added to the folder
[test-cluster-yamls](/otelcollector/test/test-cluster-yamls/) in the
correct configmap or as a CR.
  - [ ] Was a new test label added?
- [ ] A string constant for the label was added to
[constants.go](/otelcollector/test/utils/constants.go).
- [ ] The label and description was added to the [test
README](/otelcollector/test/README.md).
- [ ] The label was added to this [PR
checklist](/.github/pull_request_template).
- [ ] The label was added as needed to
[testkube-test-crs.yaml](/otelcollector/test/testkube/testkube-test-crs.yaml).
  - [ ] Are additional API server permissions needed for the new tests?
- [ ] These permissions have been added to
[api-server-permissions.yaml](/otelcollector/test/testkube/api-server-permissions.yaml).
  - [ ] Was a new test suite (a new folder under `/tests`) added?
- [ ] The new test suite is included in
[testkube-test-crs.yaml](/otelcollector/test/testkube/testkube-test-crs.yaml).

---------

Co-authored-by: Soham Dasgupta <sohamdg081992@gmail.com>
Co-authored-by: Vishwanath <visnara@microsoft.com>
2024-10-24 18:04:25 -07:00
Grace Wehner f8711332bc
ci/cd: pipeline reliability and testkube fixes (#998)
[comment]: # (Note that your PR title should follow the conventional
commit format: https://conventionalcommits.org/en/v1.0.0/#summary)
# PR Description
- Skip some SDL tasks for branch builds. Still run for PRs and merges to
main
- Add retries to trivy task if failing to pull from the DB. Do not retry
if the scan actually ran and failed because of vulnerabilities
- Enable backup DB that trivy has added through an env var.
- Check for arc proxy cluster to be ready and add retries.
- Fix testkube configmap yaml to scrape correct node-exporter port
2024-10-17 19:18:58 +00:00
Vishwanath 228e832afa Update azure-pipeline-build.yml
add branch to test
2024-10-15 02:11:07 -04:00
Grace Wehner 77dcfe3d39
Fix: NODE_IP replacement in static_configs (#989)
[comment]: # (Note that your PR title should follow the conventional
commit format: https://conventionalcommits.org/en/v1.0.0/#summary)
# PR Description
Move NODE_IP out of incorrect if statement block.
Add to test that up=1 for NODE_IP replacement test
2024-10-05 00:15:14 +00:00
Grace Wehner 7770fda690
Release: upgrade version, fix test files (#988) 2024-10-04 02:45:17 +00:00
Grace Wehner b029482622
Build: add CI/CD AKS cluster with http proxy (#986)
Co-authored-by: Vishwanath <visnara@microsoft.com>
2024-10-03 20:07:34 +00:00
Grace Wehner b6ccf5226e
Fix: upgrade components (#979)
[comment]: # (Note that your PR title should follow the conventional
commit format: https://conventionalcommits.org/en/v1.0.0/#summary)
# PR Description
## Upgrades
- Golang: 1.21.5 -> 1.22.7
- OtelCollector/Operator: 0.99.0 -> 0.109.0
- Telegraf: 1.28.5 -> 1.29.4
## Changes
- Remove extra components from otelcollector-builder that we aren't
using
- Add single golang_version variable to the pipeline so that we only
need to make one change to upgrade golang. This variable is also an ARG
in the dockerfiles
- Update upgrades README with latest info

---------

Co-authored-by: Vishwanath <visnara@microsoft.com>
2024-10-02 22:39:05 +00:00
Grace Wehner 74a753dcf0
Arc: new release image, enable target-allocator, add CI/CD proxy cluster (#977)
[comment]: # (Note that your PR title should follow the conventional
commit format: https://conventionalcommits.org/en/v1.0.0/#summary)
# PR Description
### Chart: 
- Enable the target-allocator for Arc
- Disable the HPA chart setting for Arc

### Pipeline:
- Add deploying to an Arc CI/CD proxy cluster
- Add running testkube after every merge and nightly for the Arc CI/CD
cluster
- Add BUILD_WINDOWS pipeline variable for faster building of the chart
when just testing linux

### Tests:
- README clarifications
- Ignore proxy for some localhost curl calls in the tests
2024-09-25 03:00:46 +00:00
bragi92 e6fad87f5e
compliace : add codeql to build pipeline (#939)
[comment]: # (Note that your PR title should follow the conventional
commit format: https://conventionalcommits.org/en/v1.0.0/#summary)
# PR Description

[comment]: # (The below checklist is for PRs adding new features. If a
box is not checked, add a reason why it's not needed.)
# New Feature Checklist

- [ ] List telemetry added about the feature.
- [ ] Link to the one-pager about the feature.
- [ ] List any tasks necessary for release (3P docs, AKS RP chart
changes, etc.) after merging the PR.
- [ ] Attach results of scale and perf testing.

[comment]: # (The below checklist is for code changes. Not all boxes
necessarily need to be checked. Build, doc, and template changes do not
need to fill out the checklist.)
# Tests Checklist

- [ ] Have end-to-end Ginkgo tests been run on your cluster and passed?
To bootstrap your cluster to run the tests, follow [these
instructions](/otelcollector/test/README.md#bootstrap-a-dev-cluster-to-run-ginkgo-tests).
  - Labels used when running the tests on your cluster:
    - [ ] `operator`
    - [ ] `windows`
    - [ ] `arm64`
    - [ ] `arc-extension`
    - [ ] `fips`
- [ ] Have new tests been added? For features, have tests been added for
this feature? For fixes, is there a test that could have caught this
issue and could validate that the fix works?
  - [ ] Is a new scrape job needed?
- [ ] The scrape job was added to the folder
[test-cluster-yamls](/otelcollector/test/test-cluster-yamls/) in the
correct configmap or as a CR.
  - [ ] Was a new test label added?
- [ ] A string constant for the label was added to
[constants.go](/otelcollector/test/utils/constants.go).
- [ ] The label and description was added to the [test
README](/otelcollector/test/README.md).
- [ ] The label was added to this [PR
checklist](/.github/pull_request_template).
- [ ] The label was added as needed to
[testkube-test-crs.yaml](/otelcollector/test/testkube/testkube-test-crs.yaml).
  - [ ] Are additional API server permissions needed for the new tests?
- [ ] These permissions have been added to
[api-server-permissions.yaml](/otelcollector/test/testkube/api-server-permissions.yaml).
  - [ ] Was a new test suite (a new folder under `/tests`) added?
- [ ] The new test suite is included in
[testkube-test-crs.yaml](/otelcollector/test/testkube/testkube-test-crs.yaml).
2024-07-16 10:45:02 -07:00
rashmichandrashekar 516c07e880
Upgrade components to 0.99 and use golang for config processing (#891)
[comment]: # (Note that your PR title should follow the conventional
commit format: https://conventionalcommits.org/en/v1.0.0/#summary)
# PR Description
1. Upgrade otelcollector and TA to 0.99
2. Remove Prometheus-Operator custom build
3. Use golang for configuration parsing instead of shell scripts
4. Use golang exe as the main exe for container start
5. Configure liveness probe from configuration-reader sidecar to
accommodate removal of hot reload
6. Web handler changes to work with latest version 

[comment]: # (The below checklist is for PRs adding new features. If a
box is not checked, add a reason why it's not needed.)
# New Feature Checklist

- [ ] List telemetry added about the feature.
- [ ] Link to the one-pager about the feature.
- [ ] List any tasks necessary for release (3P docs, AKS RP chart
changes, etc.) after merging the PR.
- [ ] Attach results of scale and perf testing.

[comment]: # (The below checklist is for code changes. Not all boxes
necessarily need to be checked. Build, doc, and template changes do not
need to fill out the checklist.)
# Tests Checklist

- [ ] Have end-to-end Ginkgo tests been run on your cluster and passed?
To bootstrap your cluster to run the tests, follow [these
instructions](/otelcollector/test/README.md#bootstrap-a-dev-cluster-to-run-ginkgo-tests).
  - Labels used when running the tests on your cluster:
    - [ ] `operator`
    - [ ] `windows`
    - [ ] `arm64`
    - [ ] `arc-extension`
    - [ ] `fips`
- [ ] Have new tests been added? For features, have tests been added for
this feature? For fixes, is there a test that could have caught this
issue and could validate that the fix works?
  - [ ] Is a new scrape job needed?
- [ ] The scrape job was added to the folder
[test-cluster-yamls](/otelcollector/test/test-cluster-yamls/) in the
correct configmap or as a CR.
  - [ ] Was a new test label added?
- [ ] A string constant for the label was added to
[constants.go](/otelcollector/test/utils/constants.go).
- [ ] The label and description was added to the [test
README](/otelcollector/test/README.md).
- [ ] The label was added to this [PR
checklist](/.github/pull_request_template).
- [ ] The label was added as needed to
[testkube-test-crs.yaml](/otelcollector/test/testkube/testkube-test-crs.yaml).
  - [ ] Are additional API server permissions needed for the new tests?
- [ ] These permissions have been added to
[api-server-permissions.yaml](/otelcollector/test/testkube/api-server-permissions.yaml).
  - [ ] Was a new test suite (a new folder under `/tests`) added?
- [ ] The new test suite is included in
[testkube-test-crs.yaml](/otelcollector/test/testkube/testkube-test-crs.yaml).

---------

Co-authored-by: Kaveesh Dubey <kadubey@microsoft.com>
Co-authored-by: Grace Wehner <grace.wehner@microsoft.com>
Co-authored-by: Vishwanath <visnara@microsoft.com>
Co-authored-by: Sohamdg081992 <31517098+Sohamdg081992@users.noreply.github.com>
2024-07-12 13:59:16 -07:00
Vishwanath 0bbd50f55c
(Draft PR) Revert Telegraf removal (#899)
[comment]: # (Note that your PR title should follow the conventional
commit format: https://conventionalcommits.org/en/v1.0.0/#summary)
# PR Description

Reverted - PRs -- #766 & #841

[comment]: # (The below checklist is for PRs adding new features. If a
box is not checked, add a reason why it's not needed.)
# New Feature Checklist

- [ ] List telemetry added about the feature.
- [ ] Link to the one-pager about the feature.
- [ ] List any tasks necessary for release (3P docs, AKS RP chart
changes, etc.) after merging the PR.
- [ ] Attach results of scale and perf testing.

[comment]: # (The below checklist is for code changes. Not all boxes
necessarily need to be checked. Build, doc, and template changes do not
need to fill out the checklist.)
# Tests Checklist

- [ ] Have end-to-end Ginkgo tests been run on your cluster and passed?
To bootstrap your cluster to run the tests, follow [these
instructions](/otelcollector/test/README.md#bootstrap-a-dev-cluster-to-run-ginkgo-tests).
  - Labels used when running the tests on your cluster:
    - [ ] `operator`
    - [ ] `windows`
    - [ ] `arm64`
    - [ ] `arc-extension`
    - [ ] `fips`
- [ ] Have new tests been added? For features, have tests been added for
this feature? For fixes, is there a test that could have caught this
issue and could validate that the fix works?
  - [ ] Is a new scrape job needed?
- [ ] The scrape job was added to the folder
[test-cluster-yamls](/otelcollector/test/test-cluster-yamls/) in the
correct configmap or as a CR.
  - [ ] Was a new test label added?
- [ ] A string constant for the label was added to
[constants.go](/otelcollector/test/utils/constants.go).
- [ ] The label and description was added to the [test
README](/otelcollector/test/README.md).
- [ ] The label was added to this [PR
checklist](/.github/pull_request_template).
- [ ] The label was added as needed to
[testkube-test-crs.yaml](/otelcollector/test/testkube/testkube-test-crs.yaml).
  - [ ] Are additional API server permissions needed for the new tests?
- [ ] These permissions have been added to
[api-server-permissions.yaml](/otelcollector/test/testkube/api-server-permissions.yaml).
  - [ ] Was a new test suite (a new folder under `/tests`) added?
- [ ] The new test suite is included in
[testkube-test-crs.yaml](/otelcollector/test/testkube/testkube-test-crs.yaml).
2024-05-29 20:10:24 +00:00
Grace Wehner fd42f0e07f
Test: fix flaky test timeouts (#897) 2024-05-29 01:57:45 +00:00
Sohamdg081992 b959a085f1
Ksm upgrade to 2.12.0 (#887)
[comment]: # (Note that your PR title should follow the conventional
commit format: https://conventionalcommits.org/en/v1.0.0/#summary)
# PR Description
Upgrade ksm to 2.12.0
I have upgraded chart using backdoor deployment and found no issues ,
this is build link :
https://github-private.visualstudio.com/azure/_build/results?buildId=80951&view=logs&s=6884a131-87da-5381-61f3-d7acc3b91d76.

I have compared between old and new KSM chart versions of helm charts
for the charts we use(deployment,role,service,etc) and found no major
changes between the old and new. Mainly the changes are in the
parameterizing some properties based on some properties in values.yaml,
but we do not have those properties in our values-template.yaml in our
addon chart , so we are good there. Here are the comparison of K-S-M
[charts](https://github.com/prometheus-community/helm-charts/compare/kube-state-metrics-5.10.1...kube-state-metrics-5.19.0#diff-964657bf9c31e2d1338046dc10aff7a7d28dc34813c6cd09d84228512e966132L25).

Screenshots of metrics flowing after upgrade:
<img width="1876" alt="ksmupgrade"
src="https://github.com/Azure/prometheus-collector/assets/31517098/7aed7ae8-4a7b-4453-86b2-062293b81344">

<img width="1901" alt="ksmupgrade1"
src="https://github.com/Azure/prometheus-collector/assets/31517098/19771142-1991-4b92-9131-83e957162be5">

This
[change](https://github.com/kubernetes/kube-state-metrics/pull/2145) was
tested using following mechanism.

The annotations are flowing before and after the upgrade(below
screenshot). The metrics are flowing fine and upgrade was successful.

Change:
If the annotation or label has no configured allowed values
(--metric-annotations-allowlist, --metric-labels-allowlist) no
object_annotations or object_labels metrics should be created.

For our testing, no change was made in either flags
(--metric-annotations-allowlist, --metric-labels-allowlist) and upgraded
chart was deployed. The default labels and annotations stopped flowing
after the upgrade.

Before upgrade:
<img width="1738" alt="before"
src="https://github.com/Azure/prometheus-collector/assets/31517098/b4ec863f-3d40-4506-bdaa-ba46f1c64dad">


After upgrade:
<img width="1892" alt="after"
src="https://github.com/Azure/prometheus-collector/assets/31517098/95fe91a5-0f5b-4273-96cd-8d51a6d393ab">





[comment]: # (The below checklist is for PRs adding new features. If a
box is not checked, add a reason why it's not needed.)
# New Feature Checklist

- [ ] List telemetry added about the feature.
- [ ] Link to the one-pager about the feature.
- [ ] List any tasks necessary for release (3P docs, AKS RP chart
changes, etc.) after merging the PR.
- [ ] Attach results of scale and perf testing.

[comment]: # (The below checklist is for code changes. Not all boxes
necessarily need to be checked. Build, doc, and template changes do not
need to fill out the checklist.)
# Tests Checklist

- [ ] Have end-to-end Ginkgo tests been run on your cluster and passed?
To bootstrap your cluster to run the tests, follow [these
instructions](/otelcollector/test/README.md#bootstrap-a-dev-cluster-to-run-ginkgo-tests).
  - Labels used when running the tests on your cluster:
    - [ ] `operator`
    - [ ] `windows`
    - [ ] `arm64`
    - [ ] `arc-extension`
    - [ ] `fips`
- [ ] Have new tests been added? For features, have tests been added for
this feature? For fixes, is there a test that could have caught this
issue and could validate that the fix works?
  - [ ] Is a new scrape job needed?
- [ ] The scrape job was added to the folder
[test-cluster-yamls](/otelcollector/test/test-cluster-yamls/) in the
correct configmap or as a CR.
  - [ ] Was a new test label added?
- [ ] A string constant for the label was added to
[constants.go](/otelcollector/test/utils/constants.go).
- [ ] The label and description was added to the [test
README](/otelcollector/test/README.md).
- [ ] The label was added to this [PR
checklist](/.github/pull_request_template).
- [ ] The label was added as needed to
[testkube-test-crs.yaml](/otelcollector/test/testkube/testkube-test-crs.yaml).
  - [ ] Are additional API server permissions needed for the new tests?
- [ ] These permissions have been added to
[api-server-permissions.yaml](/otelcollector/test/testkube/api-server-permissions.yaml).
  - [ ] Was a new test suite (a new folder under `/tests`) added?
- [ ] The new test suite is included in
[testkube-test-crs.yaml](/otelcollector/test/testkube/testkube-test-crs.yaml).
2024-05-23 22:41:35 -07:00
Grace Wehner f8299c6708
Infra: switch to managed identity for Arc extension version release (#895)
[comment]: # (Note that your PR title should follow the conventional
commit format: https://conventionalcommits.org/en/v1.0.0/#summary)
# PR Description

Use a dev managed identity for our private arc release for our CI/CD
cluster. This uses [workload identity federation
credentials](https://devblogs.microsoft.com/devops/public-preview-of-workload-identity-federation-for-azure-pipelines/)
as a service connection to be able to use the managed identity in the
pipeline build agent.

Use the same release managed identity we use for our other ev2 release
to push the image to the prod ACR for our Arc prod release now.
2024-05-21 15:23:38 -07:00
Sohamdg081992 56bc7e3d06
fix release script - install oras (#894)
[comment]: # (Note that your PR title should follow the conventional
commit format: https://conventionalcommits.org/en/v1.0.0/#summary)
# PR Description
fix release script - update image which has oras installed

Fix for build
[failure](https://ev2portal.azure.net/#/Rollout/ContainerInsightsAgent/6d20c2ec-6a1b-4b04-815e-6b38e6612b8d?RolloutInfra=Prod)

[comment]: # (The below checklist is for PRs adding new features. If a
box is not checked, add a reason why it's not needed.)
# New Feature Checklist

- [ ] List telemetry added about the feature.
- [ ] Link to the one-pager about the feature.
- [ ] List any tasks necessary for release (3P docs, AKS RP chart
changes, etc.) after merging the PR.
- [ ] Attach results of scale and perf testing.

[comment]: # (The below checklist is for code changes. Not all boxes
necessarily need to be checked. Build, doc, and template changes do not
need to fill out the checklist.)
# Tests Checklist

- [ ] Have end-to-end Ginkgo tests been run on your cluster and passed?
To bootstrap your cluster to run the tests, follow [these
instructions](/otelcollector/test/README.md#bootstrap-a-dev-cluster-to-run-ginkgo-tests).
  - Labels used when running the tests on your cluster:
    - [ ] `operator`
    - [ ] `windows`
    - [ ] `arm64`
    - [ ] `arc-extension`
    - [ ] `fips`
- [ ] Have new tests been added? For features, have tests been added for
this feature? For fixes, is there a test that could have caught this
issue and could validate that the fix works?
  - [ ] Is a new scrape job needed?
- [ ] The scrape job was added to the folder
[test-cluster-yamls](/otelcollector/test/test-cluster-yamls/) in the
correct configmap or as a CR.
  - [ ] Was a new test label added?
- [ ] A string constant for the label was added to
[constants.go](/otelcollector/test/utils/constants.go).
- [ ] The label and description was added to the [test
README](/otelcollector/test/README.md).
- [ ] The label was added to this [PR
checklist](/.github/pull_request_template).
- [ ] The label was added as needed to
[testkube-test-crs.yaml](/otelcollector/test/testkube/testkube-test-crs.yaml).
  - [ ] Are additional API server permissions needed for the new tests?
- [ ] These permissions have been added to
[api-server-permissions.yaml](/otelcollector/test/testkube/api-server-permissions.yaml).
  - [ ] Was a new test suite (a new folder under `/tests`) added?
- [ ] The new test suite is included in
[testkube-test-crs.yaml](/otelcollector/test/testkube/testkube-test-crs.yaml).
2024-05-21 14:33:53 -07:00
Sohamdg081992 16185f37a6
Fix the signature artifacts drop issue (#885)
[comment]: # (Note that your PR title should follow the conventional
commit format: https://conventionalcommits.org/en/v1.0.0/#summary)
# PR Description

[comment]: # (The below checklist is for PRs adding new features. If a
box is not checked, add a reason why it's not needed.)
# New Feature Checklist

- [ ] List telemetry added about the feature.
- [ ] Link to the one-pager about the feature.
- [ ] List any tasks necessary for release (3P docs, AKS RP chart
changes, etc.) after merging the PR.
- [ ] Attach results of scale and perf testing.

[comment]: # (The below checklist is for code changes. Not all boxes
necessarily need to be checked. Build, doc, and template changes do not
need to fill out the checklist.)
# Tests Checklist

- [ ] Have end-to-end Ginkgo tests been run on your cluster and passed?
To bootstrap your cluster to run the tests, follow [these
instructions](/otelcollector/test/README.md#bootstrap-a-dev-cluster-to-run-ginkgo-tests).
  - Labels used when running the tests on your cluster:
    - [ ] `operator`
    - [ ] `windows`
    - [ ] `arm64`
    - [ ] `arc-extension`
- [ ] Have new tests been added? For features, have tests been added for
this feature? For fixes, is there a test that could have caught this
issue and could validate that the fix works?
  - [ ] Is a new scrape job needed?
- [ ] The scrape job was added to the folder
[test-cluster-yamls](/otelcollector/test/test-cluster-yamls/) in the
correct configmap or as a CR.
  - [ ] Was a new test label added?
- [ ] A string constant for the label was added to
[constants.go](/otelcollector/test/utils/constants.go).
- [ ] The label and description was added to the [test
README](/otelcollector/test/README.md).
- [ ] The label was added to this [PR
checklist](/.github/pull_request_template).
- [ ] The label was added as needed to
[testkube-test-crs.yaml](/otelcollector/test/testkube/testkube-test-crs.yaml).
  - [ ] Are additional API server permissions needed for the new tests?
- [ ] These permissions have been added to
[api-server-permissions.yaml](/otelcollector/test/testkube/api-server-permissions.yaml).
  - [ ] Was a new test suite (a new folder under `/tests`) added?
- [ ] The new test suite is included in
[testkube-test-crs.yaml](/otelcollector/test/testkube/testkube-test-crs.yaml).
2024-05-10 08:58:02 -07:00
Sohamdg081992 079dabbc35
Update service connection and other parameters for ESRP v5 (#872)
[comment]: # (Note that your PR title should follow the conventional
commit format: https://conventionalcommits.org/en/v1.0.0/#summary)
# PR Description
1. Update service connection and other parameters for ESRP v5
2. Update to sign all images

Doc followed for the new ESRP setup: [MS Sign Knowledge Base - ADO Task
v5
(sharepoint.com)](https://microsoft.sharepoint.com/teams/prss/Codesign/SitePages/ADO%20Task%20v5.aspx?OR=Teams-HL&CT=1714049743060&xsdata=MDV8MDJ8c29oZGFzZ3VwdGFAbWljcm9zb2Z0LmNvbXw4YTc4ZGVmNDJkODg0MjJmNTM4ZDA4ZGM2NTQwMDU4NHw3MmY5ODhiZjg2ZjE0MWFmOTFhYjJkN2NkMDExZGI0N3wxfDB8NjM4NDk2NTcyODU0MjA1ODAwfFVua25vd258VFdGcGJHWnNiM2Q4ZXlKV0lqb2lNQzR3TGpBd01EQWlMQ0pRSWpvaVYybHVNeklpTENKQlRpSTZJazFoYVd3aUxDSlhWQ0k2TW4wPXwwfHx8&sdata=aHV6OG95VHVCbmc4OE9ETkN2bGRWMndMZHh0Y0R3WDdIUkRVWnF3ZG5jTT0%3d&clickparams=eyAiWC1BcHBOYW1lIiA6ICJNaWNyb3NvZnQgT3V0bG9vayIsICJYLUFwcFZlcnNpb24iIDogIjE2LjAuMTc1MzEuMjAwOTAiLCAiT1MiIDogIldpbmRvd3MiIH0%3D)
[comment]: # (The below checklist is for PRs adding new features. If a
box is not checked, add a reason why it's not needed.)
# New Feature Checklist

- [ ] List telemetry added about the feature.
- [ ] Link to the one-pager about the feature.
- [ ] List any tasks necessary for release (3P docs, AKS RP chart
changes, etc.) after merging the PR.
- [ ] Attach results of scale and perf testing.

[comment]: # (The below checklist is for code changes. Not all boxes
necessarily need to be checked. Build, doc, and template changes do not
need to fill out the checklist.)
# Tests Checklist

- [ ] Have end-to-end Ginkgo tests been run on your cluster and passed?
To bootstrap your cluster to run the tests, follow [these
instructions](/otelcollector/test/README.md#bootstrap-a-dev-cluster-to-run-ginkgo-tests).
  - Labels used when running the tests on your cluster:
    - [ ] `operator`
    - [ ] `windows`
    - [ ] `arm64`
    - [ ] `arc-extension`
- [ ] Have new tests been added? For features, have tests been added for
this feature? For fixes, is there a test that could have caught this
issue and could validate that the fix works?
  - [ ] Is a new scrape job needed?
- [ ] The scrape job was added to the folder
[test-cluster-yamls](/otelcollector/test/test-cluster-yamls/) in the
correct configmap or as a CR.
  - [ ] Was a new test label added?
- [ ] A string constant for the label was added to
[constants.go](/otelcollector/test/utils/constants.go).
- [ ] The label and description was added to the [test
README](/otelcollector/test/README.md).
- [ ] The label was added to this [PR
checklist](/.github/pull_request_template).
- [ ] The label was added as needed to
[testkube-test-crs.yaml](/otelcollector/test/testkube/testkube-test-crs.yaml).
  - [ ] Are additional API server permissions needed for the new tests?
- [ ] These permissions have been added to
[api-server-permissions.yaml](/otelcollector/test/testkube/api-server-permissions.yaml).
  - [ ] Was a new test suite (a new folder under `/tests`) added?
- [ ] The new test suite is included in
[testkube-test-crs.yaml](/otelcollector/test/testkube/testkube-test-crs.yaml).
2024-05-06 11:47:34 -07:00
Vishwanath 07438b2767
Upgrade ME (for hdinsights OOM bug) (#877)
* Upgrade ME for both linux & windows (this fixes the HD insights OOM
issue) ME version from: metricsext2-2.2024.328.1744 to:
metricsext2-2.2024.419.1535
* Up version for the release
* Update release notes
* Update .trivyignore for new CVEs (need to be fixed in next release)


There is no significant difference before v after in either cpu or mem
usage for ds (linux & windows) & rs (see below) , and also successful
metric ingestion volume is the same , with no drops in ME --

ds (linux) --

<img width="1851" alt="image"
src="https://github.com/Azure/prometheus-collector/assets/10353076/1adc92cb-5ea9-46cf-9a43-642f13e4cf5f">



ds (windows) --

<img width="1862" alt="image"
src="https://github.com/Azure/prometheus-collector/assets/10353076/c991f778-0fb1-4cdc-89e4-56ef427e144b">



rs --

<img width="1861" alt="image"
src="https://github.com/Azure/prometheus-collector/assets/10353076/e99fbd95-2002-42b1-be6f-745b273bc895">



[comment]: # (The below checklist is for PRs adding new features. If a
box is not checked, add a reason why it's not needed.)
# New Feature Checklist

- [ ] List telemetry added about the feature.
- [ ] Link to the one-pager about the feature.
- [ ] List any tasks necessary for release (3P docs, AKS RP chart
changes, etc.) after merging the PR.
- [ ] Attach results of scale and perf testing.

[comment]: # (The below checklist is for code changes. Not all boxes
necessarily need to be checked. Build, doc, and template changes do not
need to fill out the checklist.)
# Tests Checklist

- [ ] Have end-to-end Ginkgo tests been run on your cluster and passed?
To bootstrap your cluster to run the tests, follow [these
instructions](/otelcollector/test/README.md#bootstrap-a-dev-cluster-to-run-ginkgo-tests).
  - Labels used when running the tests on your cluster:
    - [ ] `operator`
    - [ ] `windows`
    - [ ] `arm64`
    - [ ] `arc-extension`
- [ ] Have new tests been added? For features, have tests been added for
this feature? For fixes, is there a test that could have caught this
issue and could validate that the fix works?
  - [ ] Is a new scrape job needed?
- [ ] The scrape job was added to the folder
[test-cluster-yamls](/otelcollector/test/test-cluster-yamls/) in the
correct configmap or as a CR.
  - [ ] Was a new test label added?
- [ ] A string constant for the label was added to
[constants.go](/otelcollector/test/utils/constants.go).
- [ ] The label and description was added to the [test
README](/otelcollector/test/README.md).
- [ ] The label was added to this [PR
checklist](/.github/pull_request_template).
- [ ] The label was added as needed to
[testkube-test-crs.yaml](/otelcollector/test/testkube/testkube-test-crs.yaml).
  - [ ] Are additional API server permissions needed for the new tests?
- [ ] These permissions have been added to
[api-server-permissions.yaml](/otelcollector/test/testkube/api-server-permissions.yaml).
  - [ ] Was a new test suite (a new folder under `/tests`) added?
- [ ] The new test suite is included in
[testkube-test-crs.yaml](/otelcollector/test/testkube/testkube-test-crs.yaml).
2024-05-04 02:54:38 +00:00
Grace Wehner 764d203256
Test: Replace SP with managed identity, add test retries, increase wait time for windows containers (#866)
[comment]: # (Note that your PR title should follow the conventional
commit format: https://conventionalcommits.org/en/v1.0.0/#summary)
# PR Description
- Replace AMW querying with managed identity instead of a service
principal
- Add FlakeAttempts() to certain tests to enable retries
- Increase wait time for windows containers to fully start up before
running tests
- Add targetallocator watcher restart error to be ignored in the
container logs tests as this is expected
- Update go.mod/go.sum to latest packages for vulnerabilities

[comment]: # (The below checklist is for code changes. Not all boxes
necessarily need to be checked. Build, doc, and template changes do not
need to fill out the checklist.)
# Tests Checklist

- [x] Have end-to-end Ginkgo tests been run on your cluster and passed?
To bootstrap your cluster to run the tests, follow [these
instructions](/otelcollector/test/README.md#bootstrap-a-dev-cluster-to-run-ginkgo-tests).
  - Labels used when running the tests on your cluster:
    - [x] `operator`
    - [x] `windows`
    - [x] `arm64`
    - [ ] `arc-extension`
- [x] Have new tests been added? For features, have tests been added for
this feature? For fixes, is there a test that could have caught this
issue and could validate that the fix works?
2024-04-30 15:04:47 -07:00
Sohamdg081992 0c09e04ed6
Update reference app images with lifecycle annotations (#869)
[comment]: # (Note that your PR title should follow the conventional
commit format: https://conventionalcommits.org/en/v1.0.0/#summary)
# PR Description This change adds lifecycle annotations to the reference
app images

[comment]: # (The below checklist is for PRs adding new features. If a
box is not checked, add a reason why it's not needed.)
# New Feature Checklist

- [ ] List telemetry added about the feature.
- [ ] Link to the one-pager about the feature.
- [ ] List any tasks necessary for release (3P docs, AKS RP chart
changes, etc.) after merging the PR.
- [ ] Attach results of scale and perf testing.

[comment]: # (The below checklist is for code changes. Not all boxes
necessarily need to be checked. Build, doc, and template changes do not
need to fill out the checklist.)
# Tests Checklist

- [ ] Have end-to-end Ginkgo tests been run on your cluster and passed?
To bootstrap your cluster to run the tests, follow [these
instructions](/otelcollector/test/README.md#bootstrap-a-dev-cluster-to-run-ginkgo-tests).
  - Labels used when running the tests on your cluster:
    - [ ] `operator`
    - [ ] `windows`
    - [ ] `arm64`
    - [ ] `arc-extension`
- [ ] Have new tests been added? For features, have tests been added for
this feature? For fixes, is there a test that could have caught this
issue and could validate that the fix works?
  - [ ] Is a new scrape job needed?
- [ ] The scrape job was added to the folder
[test-cluster-yamls](/otelcollector/test/test-cluster-yamls/) in the
correct configmap or as a CR.
  - [ ] Was a new test label added?
- [ ] A string constant for the label was added to
[constants.go](/otelcollector/test/utils/constants.go).
- [ ] The label and description was added to the [test
README](/otelcollector/test/README.md).
- [ ] The label was added to this [PR
checklist](/.github/pull_request_template).
- [ ] The label was added as needed to
[testkube-test-crs.yaml](/otelcollector/test/testkube/testkube-test-crs.yaml).
  - [ ] Are additional API server permissions needed for the new tests?
- [ ] These permissions have been added to
[api-server-permissions.yaml](/otelcollector/test/testkube/api-server-permissions.yaml).
  - [ ] Was a new test suite (a new folder under `/tests`) added?
- [ ] The new test suite is included in
[testkube-test-crs.yaml](/otelcollector/test/testkube/testkube-test-crs.yaml).
2024-04-29 11:24:04 -07:00
Sohamdg081992 2dc11c3ab8
add remaining sdl scans similar to onebranch default (#858)
Add the remaining SDL scans based on Official build default behavior
https://eng.ms/docs/products/onebranch/securitycompliancegovernanceandpolicies/sdlforcontainerizedworkflows
2024-04-24 00:23:18 +00:00
Sohamdg081992 79c6c76477
Add life cycle metadata to container image (#842)
This change adds life cycle metadata to the container image.

Followed this document:
https://eng.ms/docs/more/containers-secure-supply-chain/lifecycle-and-lineage

This adds metadata end-of-life annotation at the image build time so
that it does not get flagged later in downstream.
2024-04-23 08:36:45 -07:00
Grace Wehner 0f950fd1f4
Test: small fixes to CRs and running tests (#835)
- Run go mod tidy to upgrade protobuf package
- Increase wait time to run merge tests after the new chart was deployed
- Add operator tests by default to dev cluster
- Convert UTC to PDT to run at midnight PT
2024-04-17 09:54:37 -07:00
Grace Wehner 8259a72515
Build: always pass test tasks for now add repo checkout to running testkube tests step (#834) 2024-04-15 15:03:56 -07:00
Grace Wehner 1b3ce25550
Build: fix deployment to ci/cd cluster changes (#831) 2024-04-15 19:15:13 +00:00
Grace Wehner 45bcd7c496
Build: fix deploy condition to use not(succeeded()) instead of failed() to account for skipped condition (#827) 2024-04-12 09:21:09 -07:00
Grace Wehner 0db6d70084
Test: Adding e2e tests, test README, and pipeline infra for tests (#695)
### PR Process
- Add pull request template, including a checklist for testing
- Add a semantic conventional commit check for PR titles (for creating
the changelog for releases)

### Pipeline
- Split pipeline into two stages: `Build` (images and charts) and
`Deploy` (on CI/CD clusters)
- Add an environment lock to the `Deploy` stage so that only one merge
to main can acquire that lock at a time
- Add running the TestKube tests on the CI/CD clusters to the `Deploy`
stage, so that a new version cannot be deployed on the CI/CD clusters
until the current tests have finished running

### Docs
- Add a testing README for processes, Ginkgo and TestKube info, how to
run the tests and bootstrap clusters, how to add new tests, etc.

### Tests
- Add the `otelcollector/tests` folder that has the starting e2e tests,
split up into test suites

---------

Co-authored-by: bragi92 <kadubey@microsoft.com>
2024-04-11 18:00:42 +00:00
Sohamdg081992 f315c8a504
Remove telegraf from linux (#766)
This version uses cpu plugin to collect system level cpu information
from container as opposed to exec plugin which collects cpu from
otelcollector and me processes indivudually. Exec plugin is not
supported in distroless images.

Apart from the below testing, I have also tested the telemetry flow by
restarting the otelcol and me process multiple times and it works fine.


------------------------------------
Screenshots for Memory reduction after removing telegraf

Without Telegraf
<img width="1817" alt="GrafanaWOTelMem%"
src="https://github.com/Azure/prometheus-collector/assets/31517098/4be2b94d-9aa6-4f2a-af68-3df83e1a05c2">
<img width="1877" alt="GrafanaWOTelRS"
src="https://github.com/Azure/prometheus-collector/assets/31517098/f7a9df54-a9fc-4a13-89f8-6c1a637e1d09">
<img width="1864" alt="GrafanaWOTelDS"
src="https://github.com/Azure/prometheus-collector/assets/31517098/d0478680-2f82-475a-b7f3-ac25a6b673bf">


With Telegraf
<img width="1877" alt="GrafanaWTelMem%"
src="https://github.com/Azure/prometheus-collector/assets/31517098/86f508b7-b0c5-40bd-802d-3f16b1b0cd62">
<img width="1862" alt="GrafanaWTelRS"
src="https://github.com/Azure/prometheus-collector/assets/31517098/e48242bb-8f8e-41b3-b642-182554729449">
<img width="1843" alt="GrafanaWTelDS"
src="https://github.com/Azure/prometheus-collector/assets/31517098/cb6c0bd1-b4dd-412f-8b30-c63dd3776387">


---------------------------------

Value comparison of Telegraf vs. Fluent Bit( both components were
running on the same pods for this testing). We got close values for
both.


I deployed the reference app and enabled scraping. Then I increased the
number of replicas to 10 for a few minutes and then reduced it to 1
again. The HI RS and HI DS are metric values recorded when this spike
was made and Low RS and Low DS are values recorded in other times.

<img width="1561" alt="portforward"
src="https://github.com/Azure/prometheus-collector/assets/31517098/f672ec28-3ca6-417a-a2a5-40a2ac1591d9">


---------


 Fluent bit

| | HI DS | HI RS | Low DS | Low RS |

|-----------------------------|--------------|--------------|--------------|--------------|
| otelcolVMRSSAvg | 117068000 | 120916000 | 111536000 | 119460000 |
| otelcolVMRSS95 | 117068000 | 122388000 | 111536000 | 121312000 |
| meVMRSSAvg | 44834816 | 41312256 | 58975872 | 48975872 |
| meVMRSS95 | 45671628 | 48975872 | 48976472 | 48975872 |
| otelcpuUsageAvg(%) | 52 | 49 | 28 | 48 |
| otelcpuUsage95(%) | 57 | 53.5 | 30 | 55 |
| meCpuUsageAvg(%) | 43 | 38.5 | 49.5 | 41 |
| meCpuUsage95(%) | 58.5 | 79 | 57 | 51.5 |


-----------------------------------

---------


 Telegraf

| | HI DS | HI RS | Low DS | Low RS |

|-----------------------------|--------------------|--------------------|--------------------|--------------------|
| otelcollector_memory_rss_050| 126353408 | 120803328 | 124489728 |
144173056 |
| otelcollector_memory_rss_095| 126353408 | 135784038 | 128032768 |
146022400 |
| metricsextension_memory_rss_050| 44834816 | 41312256 | 36065280 |
55238656 |
| metricsextension_memory_rss_095| 45671628 | 48975872 | 37960908 |
55238656 |
| otelcollector_cpu_usage_050 | 0.321608126437242 | 0.143547678977113 |
0.171110652970617 | 0.190275998078321 |
| otelcollector_cpu_usage_095 | 0.458843069727404 | 0.959536547881522 |
0.286772234370546 | 0.696191806862753 |
| metricsextension_cpu_usage_050 | 0.274309766455724 | 0.401607109869771
| 0.234063760776055 | 0.40123204052676 |
| metricsextension_cpu_usage_095 | 0.485236937139407 | 0.617179716178943
| 0.332776288758776 | 0.609020321347796 |


-----------------------------------

Below are screenshots of all metric types. 

<img width="1873" alt="envVars"
src="https://github.com/Azure/prometheus-collector/assets/31517098/31d584dc-4fc4-4125-93fe-e9bbd6a1a313">

<img width="1408" alt="mecpu"
src="https://github.com/Azure/prometheus-collector/assets/31517098/3e07eba4-9382-43a7-8da6-825b9815b1d5">
<img width="1418" alt="mecpuDimension"
src="https://github.com/Azure/prometheus-collector/assets/31517098/77958c04-3e87-48ef-8bf8-a50da1f56169">
<img width="1462" alt="meVmrss95"
src="https://github.com/Azure/prometheus-collector/assets/31517098/1ad6b877-0f9a-4ecd-ae8a-c1878e3cb248">
<img width="1436" alt="otelcolvmrss95"
src="https://github.com/Azure/prometheus-collector/assets/31517098/89295ca8-0e45-4cc6-a285-92bdc425c64b">
<img width="1444" alt="otelcpu"
src="https://github.com/Azure/prometheus-collector/assets/31517098/8f987866-de5f-4d6a-a632-45ebcf6c50ff">

----------------------------------------

The below section has screenshots of operator related telemetry from
both Telegraf and Fluent bit , showing that the dimensions are
identical.

Telegraf based screenshots

![TelegrafOperatorModel1](https://github.com/Azure/prometheus-collector/assets/31517098/c9a5c2e5-790d-43c1-842e-e05a03f5b807)

![TelegrafOperatorModel2](https://github.com/Azure/prometheus-collector/assets/31517098/6784d7aa-2b26-4607-a845-3cb502e3d6a8)


![OperatorMetricType](https://github.com/Azure/prometheus-collector/assets/31517098/e3b468bb-fe4a-4acf-b818-72516718e81c)

Fluent Bit based screenshots


![FBOperatorModel1](https://github.com/Azure/prometheus-collector/assets/31517098/99f3c7f7-9b73-417d-80c7-72481356cb77)

![FBOperatorModel11](https://github.com/Azure/prometheus-collector/assets/31517098/219c5573-db13-4915-a5ea-b1394cb1b070)

![FBOperatorModel22](https://github.com/Azure/prometheus-collector/assets/31517098/6033a2dd-097e-421d-92fd-9db0df0a96e5)

![FBOperatorModel21](https://github.com/Azure/prometheus-collector/assets/31517098/4ea25259-d39c-4a89-a312-5e54e2372ed2)



--------------------------------------------------------------------------------------

Below are the metric name based screenshots of Telegraf and Fluent Bit,
they show both are same.

Telegraf based screenshots:

Target allocator metric names


![TelegrafTAMetricName](https://github.com/Azure/prometheus-collector/assets/31517098/0b1b07d5-bd17-4a39-ae02-40bf88d7e2da)

Prometheus metric names


![TelegrafPromMetricName](https://github.com/Azure/prometheus-collector/assets/31517098/cd26e205-33b6-4b99-83ae-bb0ca23dcf4d)

CPU usage metric names for a process


![TelegrafCPUMetricName](https://github.com/Azure/prometheus-collector/assets/31517098/2487280a-d424-4e91-a03f-d52ba830f9c1)

Memory usage metric names for a process     


![TelegrafMemoryMetricName](https://github.com/Azure/prometheus-collector/assets/31517098/0cf90dab-2c9d-4c9b-96ab-c01525d039ac)
 

Fluent Bit based screenshots:

Target allocator metric names

![FBOperatorMetricName1](https://github.com/Azure/prometheus-collector/assets/31517098/8d689ad4-50ac-46ed-8b93-494a7253153d)

![FBOperatorMetricName2](https://github.com/Azure/prometheus-collector/assets/31517098/727281e1-d832-48fe-b75c-061f9459698d)
<img width="926" alt="FBOperatorMetricName3"
src="https://github.com/Azure/prometheus-collector/assets/31517098/fcdf1643-59e1-485a-8ddf-02859b4e33cb">


Prometheus metric names

<img width="929" alt="FBPromMetricName"
src="https://github.com/Azure/prometheus-collector/assets/31517098/477712ec-f49e-4f44-947d-708f1552b39f">


CPU usage metric names for a process

![FBCPUMetricName](https://github.com/Azure/prometheus-collector/assets/31517098/e9641544-642b-42da-a279-74ec7f4e21eb)


Memory usage metric names for a process 


![FBMemoryMetricName1](https://github.com/Azure/prometheus-collector/assets/31517098/c3a9f0fa-a9fe-49cc-9ae1-b8632672f787)
2024-04-11 08:44:42 -07:00
bragi92 82adbf971f
fix: Update rollout spec with proper ccp step name (#815) 2024-04-09 09:04:21 -07:00
Grace Wehner c70eb5c8d0
Fix: For Arc, use a default value CloudEnvironment that customers can change for other clouds (#812) 2024-04-05 18:44:28 +00:00
Grace Wehner ff260f1426
Upgrade: fluent-bit from 2.0.9 to 2.1.10 (#809)
- Add new .so files for upgraded fluent-bit
- Fix container log collection path for Windows
- Add Windows container CPU and memory usage telemetry for
prometheus-collector and addon-token-adapter-win containers
- Only send keeplistregex and scrapeinterval telemetry from the
replicaset since these env vars are the same no matter if it's on the
daemonset or replicaset
- Add app insights key env var in Dockerfile to non-distroless build for
whenever building without the distroless image
- Fix bug in build for image check before deploying to the dev clusters
2024-04-04 18:04:34 +00:00
Grace Wehner ffe40e0096
Build: pipeline feedback and fixes (#803) 2024-03-28 02:02:08 +00:00
bragi92 a80cff04fe
build: push ccp linux image to prod mcr (#805) 2024-03-27 17:25:50 -07:00
bragi92 ac6dcf130f
feat : add support for per cloud AI instance (#798) 2024-03-26 18:57:49 +00:00
bragi92 5fbdacd70c
Step 0 : Merge CCP changes to main with a separate image (#653)
Co-authored-by: Nina Segares <ninasegares@microsoft.com>
2024-03-14 18:51:18 +00:00
Grace Wehner 9a2ffa8523
build: onebranch migration (#778) 2024-03-13 23:00:32 +00:00
Sohamdg081992 72a9604c7d
Correct PV alert syntax (#757)
This is a copy paste bug which got introduced in the Arm template and
also in bicep(since it is generated from arm). I have doube checked all
the recommended alerts in our repo(both arm and bicep) for syntax errors
in Grafana and this was the only one.
2024-02-27 10:39:11 -08:00
Grace Wehner 0349c8630e
[infra] Add remote ACR registry cache to improve linux build times (#729)
* use link in build

* commit all changes

* increase windows build timeout

* get build

* combine windows RUN statements

* try windows with buildx

* try linux registry cache

* get buildx version

* fix build failures

* fix cache formatting

* max cache mode

* fix dockerfile

* try caching for all images

* include windows cache

* cleanup

* remove branch build
2024-02-13 13:59:39 -08:00
bragi92 fbfe7ae2a4
fix: stop copying libssl.so.1.1 & libcrypto.so.1.1 as they are already available in distroless mariner (#739)
* fix: stop copying libssl.so.1.1 & libcrypto.so.1.1 as they are already available with openssl in distroless and copying them over causes FIPS HMAC verification failures

* remove extra branch

* release note update

* .
2024-02-08 01:34:38 +00:00
Grace Wehner caf7fa07ca
[infra] Fix commented out ARC deploy chart condition (#737) 2024-02-07 04:40:34 +00:00
Grace Wehner f67a8b5042
[Arc] Upgrade node-exporter chart from 4.21.0 to 4.26.0 (#733) 2024-02-02 10:16:23 -08:00
Sohamdg081992 764bce20d1
Add telemetry for collector and addon token adaptor (#734)
* Add telemetry for addon token adaptor and prometheus collector

* add test branch

* Create new metric for Pod name

* remove branch
2024-01-30 11:55:09 -08:00
Grace Wehner 8e8734168b
[infra] Increase Windows build timeout from 60 minutes to 120 2024-01-24 16:52:03 -08:00
bragi92 ce3b3b04f6
SDL Requirment : add policheck (#726)
* Add policheck

* Update azure-pipeline-build.yml

* try on windows

* Update azure-pipeline-build.yml
2024-01-23 23:47:19 +00:00
Grace Wehner 6ece1329e6
add checking if state is failed for arc (#724) 2024-01-19 23:16:06 +00:00
Grace Wehner 28027cfdde
[infra] fixes for build improvements (#720) 2024-01-16 23:03:36 +00:00
rashmichandrashekar a192d342a6
adding back release steps for 1p helm chart (#715) 2024-01-09 23:50:37 +00:00
rashmichandrashekar 834a854d3c
Fix build issues (#714) 2024-01-09 19:23:04 +00:00