Граф коммитов

11 Коммитов

Автор SHA1 Сообщение Дата
Hunter Gregory 9181fa67d0
ci: [NPM] add bash directive to scale scripts (#2876)
ci: add bash directive to scale scripts

Signed-off-by: Hunter Gregory <42728408+huntergregory@users.noreply.github.com>
2024-07-25 17:43:27 +00:00
Hunter Gregory 7e90960ed0
test(scale): [NPM] fix flakes in kwok and capture kernel state on failure (#2249)
* test(kwok): try standard tier for cluster

* Revert "test(kwok): try standard tier for cluster"

This reverts commit f76e50a559.

* test: run kwok as pod

* fix: add execute permission to sh files

* fix: allow scheduling on linux for kwok pod

* fix: wait timeouts and add retry logic

* fix: make sure to reapply kwok nodes if wait fails

* test: print out cluster state if wait fails

* test: prevent kwok from scheduling on windows node

* test: first wait for kwok pods (20 minutes)

* style: rearrange wait check

* fix: scale up kwok controller for reliability

* fix: typo in scaling kwok pods

* fix: check kwok pods running in test-connectivity instead of test-scale

* fix: wait for pods before adding NetPol

* fix: 7 second timeout for windows agnhost connect

* feat: get cluster state on failure

* debug: fake a failure to verify log capture

* fix: bugs in getting cluster state

* fix: remove newline instead of "n"

* Revert "debug: fake a failure to verify log capture"

This reverts commit 24ec927425.

* feat(win-debug): get prom metrics

* fix: leave timeout=5s for win

* style: remove new, unused --connect-timeout parameter

* style: comment

* feat: top node/pod
2023-11-29 15:37:46 -08:00
Camryn Lee f71277149a
ci: cilium scale pipeline (#2085)
* ci: adding scale pipeline for cilium

* change timeout

* change timeout

* update net policies

* use acn build pool to avoid azp delays

* address comments -- update build pool var and remove timeouts from scale

* address comments -- set node/pod counts and test input as variables

* remove all test resources

* check apachebench rollout status

* collect more cpu/mem results

* add cns restart and fix artifact upload

* add cns restart and fix artifact upload

* update name for artifact publishing

* update apachebench artifact collection

* update apachebench artifact collection

* test cns version check

* change cns update

* update artifact directory name

* add netperf testing stage

* give permissions to netperf script

* change netperf steps

* update path to netperf yaml

* change netperf deployment to 2

* get correct pods in netperf script

* publish netperf results

* publish netperf results

* add same vm test for netperf

* netperf script to print pod values

* netperf find nodes logic

* netperf find nodes logic
2023-08-14 22:37:50 +00:00
Camryn Lee ce0e06cfe0
test: create nginx deployments in scale test (#2074)
* test: create nginx deployments in scale test

* address comments and define --realPodType

* remove instances of numRealNginxDeployment

* move label-nodes script
2023-07-25 21:13:59 +00:00
Vipul Singh 68cddea8ee
fix: [CI}mismatch in the variable name to deploy the services in scale test (#2070)
fix: mismatch in the variable name to deploy the services in scale test
2023-07-21 21:35:04 +00:00
Vipul Singh 51152f3d0f
test:[NPM] Adding service deployment for the scale test (#2007) 2023-06-20 02:30:20 +00:00
Hunter Gregory 940a7a73d0
ci: [NPM] improve scale pipeline & fix edge case in scale script (#1975)
* ci: check if directory is empty before applying it

* ci: don't wait for pods if they weren't created

* docs: fix script name

* ci: wip for enabling linux scale test

* ci: parameters for linux vs windows

* ci: adjust params

* ci: fix bash typo

* ci: fix cp

* ci: fix npm url

* ci: increase max pods for linux nodepool

* ci: start building windows image again

* tmp: use apply netpol in background image

* Revert "tmp: use apply netpol in background image"

This reverts commit eff43c5439.

* refactor: use CLUSTER_NAME variable

* ci: require succeeded() for scale & conformance tests

* test: fix vars used in test-scale.sh checks

* ci: disable linux, reenable windows

* ci: increase sleep before waiting for NPM to start & log info when it doesn't

* ci: better log capture & remove command from other pipeline

* ci: do not get logs of npm on kwok nodes

* ci: do not get logs of npm on kwok nodes (part 2)
2023-06-09 10:00:03 -07:00
Vipul Singh a0d2b2505d
test: Adding prometheus config for acn docs (#1964) 2023-06-07 14:47:25 +00:00
Hunter Gregory ce11a8da2b
ci: [NPM] scale test pipeline using KWOK (#1915)
* wip

* temporarily disable most conf runs

* update readme

* back to raw yamls and clone the branch to run scale test

* fix raw yaml URLs

* fix inline script

* fix length of rg name

* uncomment all conf again

* comment out everything unnecessary for testing

* remove commented out dependencies

* use master branch for pipeline

* label nodes

* multiple nodes

* uncomment rest of conformance pipeline (originally commented for testing)

* fix print out for time taken in test-connectivity.sh

* fix: run kwok command in background

* mkdir for kwok log

* try azure cli 1 to fix login error

* Revert "try azure cli 1 to fix login error"

This reverts commit f1671e3939.

* move scale test to new pipeline yaml

* remove scale test from conformance pipeline yaml

* revert name change for cyclonus job

* remove unnecessary image build and variable

* error codes and display names

* change sleep and wait for npm logic

* look at directory

* use pre-cloned repo

* fix directory path

* install kubectl first

* FIXME: comment out succeeded condition

* kubectl binary arg

* kubectl for scale test

* fix label selector

* fix kubectl path

* fix kubectl binary arg

* fix kwok, more steps

* FIXME: temporarily use custom fast image

* fix kwok pid and add comment

* 10m timeout for connectivity after crud

* fix kwok command invocation

* bump up timeouts for testing

* higher memory limit

* add note to connectivity script

* fix sed

* no need to curl npm yaml

* tmp: comment things out to test final step

* only check if kwok pods are running, not necessarily ready

* Revert "tmp: comment things out to test final step"

This reverts commit 7b21125ab1.

* update registry keys to fix HNS reliability

* update regkey code

* sleep to let NPM restart in case of bootup failure

* adaptive wait timeout

* change some errors to warnings

* log date

* make sure all pods are labeled

* delete and readd labels after deleting pods

* tmp: skip large scale up and connectivity check for testing

* fix overwrite arg

* rename tasks and uncomment things

* update command for updating reg key

* make timeout logic simpler

* back to reg add command for regkeys

* official timeouts instead of test values

* delete task updating registry keys and stop hardcoding npm image

* increase sleep
2023-05-18 09:57:21 -07:00
Hunter Gregory d9104f06f7
test: expand KWOK scripts to test deletions and more (#1910)
* correct num acls

* print out pods for debugging

* fix latency value printed out

* unapplied netpols and delete chaos

* print out new args

* simplify kwok install and run script

* fix typo

* example run

* fix sleep and num ipsets

* better run-kwok.sh

* fix sleep

* fix run kwok

* fix pipe

* fix labels on pods

* fix delete

* add connectivity to readme

* Update README.md

* fix printing

* wait for pods to come up after deleting

* 'all' option
2023-04-14 10:10:40 -07:00
Hunter Gregory 9e337baff5
test: k8s scale testing with KWOK (#1879)
* k8s scale testing with kwok

* Update README.md

* fix netpol labels so that they apply to pods

* test connectivity

* parameterize scripts

* rename scale script and update readme

* clean up readme

* add NetPol after connectivity check

* retry connectivity loops

* fix connectivity test script and netpol

* script to capture cpu/mem

* fix typo in help

* kwok kubeconfig

* fix cpu and mem capture
2023-04-06 12:34:53 -07:00