Граф коммитов

7 Коммитов

Автор SHA1 Сообщение Дата
Evan Baker 6a53d5d49b
feat: Windows HPC base image (#3105)
feat: windows hpc base image

Signed-off-by: Evan Baker <rbtr@users.noreply.github.com>
2024-11-12 16:42:42 +00:00
Hunter Gregory c561c895fe
ci: [NPM] speed up Windows conformance and disable Windows Cyclonus/Scale (#2874)
* ci: comment out windows cyclonus and windows scale test

Signed-off-by: Hunter Gregory <42728408+huntergregory@users.noreply.github.com>

* ci: slim down windows conformance to 14 tests

Signed-off-by: Hunter Gregory <42728408+huntergregory@users.noreply.github.com>

* ci: make sure conformance skips "Linux Only"

Signed-off-by: Hunter Gregory <42728408+huntergregory@users.noreply.github.com>

* ci: conformance was not running test cases due to formatting

Signed-off-by: Hunter Gregory <42728408+huntergregory@users.noreply.github.com>

---------

Signed-off-by: Hunter Gregory <42728408+huntergregory@users.noreply.github.com>
2024-07-24 01:57:02 +00:00
John Payne e28500c0af
ci: Update triggers for NPM pipelines (#2867)
ci: add triggers to npm pipelines
2024-07-22 19:25:45 +00:00
Hunter Gregory 7e90960ed0
test(scale): [NPM] fix flakes in kwok and capture kernel state on failure (#2249)
* test(kwok): try standard tier for cluster

* Revert "test(kwok): try standard tier for cluster"

This reverts commit f76e50a559.

* test: run kwok as pod

* fix: add execute permission to sh files

* fix: allow scheduling on linux for kwok pod

* fix: wait timeouts and add retry logic

* fix: make sure to reapply kwok nodes if wait fails

* test: print out cluster state if wait fails

* test: prevent kwok from scheduling on windows node

* test: first wait for kwok pods (20 minutes)

* style: rearrange wait check

* fix: scale up kwok controller for reliability

* fix: typo in scaling kwok pods

* fix: check kwok pods running in test-connectivity instead of test-scale

* fix: wait for pods before adding NetPol

* fix: 7 second timeout for windows agnhost connect

* feat: get cluster state on failure

* debug: fake a failure to verify log capture

* fix: bugs in getting cluster state

* fix: remove newline instead of "n"

* Revert "debug: fake a failure to verify log capture"

This reverts commit 24ec927425.

* feat(win-debug): get prom metrics

* fix: leave timeout=5s for win

* style: remove new, unused --connect-timeout parameter

* style: comment

* feat: top node/pod
2023-11-29 15:37:46 -08:00
Hunter Gregory ebddca18bd
perf: [NPM] [LINUX] add NetPols in background (#1969)
* wip: apply dirty NetPols every 500ms in Linux

* only build npm linux image

* fix: check for empty cache

* feat: toggle for netpol interval. default 500 ms

* ci: remove stages "build binaries" and "run windows tests"

* wip: max batched netpols (toggle-specified)

* ci: remove manifest build/push for win npm

* wip: handle ipset deletion properly and max batch for delete too

* fix: correct remove policy

* fix: only remove policy if it was in kernel

* finalize toggles, allowing ability to turn off iptablesInBackground

* ci: conf + cyc use PR's configmaps

* fix: lints

* fix dp toggle: iptablesInBackground

* fix lock typo and config logging

* fix background thread. add comments. only add tmp ref when enabled

* copy pod selector list

* fix: removepolicy needs namespace too

* rename opInfo to event

* fix: fix references and prevent concurrent map read/write

* tmp: debug logging

* fix: missing set references by swap keys and values

* Revert "tmp: debug logging"

This reverts commit 70ed34c714ea4a6d009a1fe90a7168be4bedd5bf.

* fix: add podSelectorList to fake NetPol

* log: do not print error when failing to delete non-existent nft rule

* log: verbose iptables bootup

* log: use fmt.Errorf for clean logging

* log: never return error for iptables in background and fix some lints

* fix: activate/deactivate azure chain rules

* fix: correctly decrement netpols in kernel

* ci: run UTs again

* ci: update profiles. default to placefirst=false

* address comment: rename batch to pendingPolicy

* refactor: make dirty cache  OS-specific

* test: UTs

* test: put UT cfg back to placefirst to not break things

* ci: update cyclonus workflows

* fmt: address comment & lint

* fmt: rename numInKernel to policiesInKernel

* log: switch to fmt.Errorf

* fmt: whitespace

* feat: resiliency to errors while reconciling dirty netpols

* log: temporarily print everything for ipset restore

* fix: remove nomatch from ipset -D for cidr blocks

* test: UTs for non-happy path

* test: fix hns fake

* fix: don't change windows. let it delete ipsets when removing policies

* fix windows lint

* fix: ignore chain doesn't exist errors for iptables -D

* feat: latency and failure metrics

* test: update exit code for UT

* metrics: new metrics should go in node-metrics path

* style: simplify nesting

* style: move identical windows & linux code to shared file

* ci: remove v1 conformance and cyclonus

* feat: add NetPols in background from the DP (revert background code in pMgr)

* style: remove "background" from iptables metrics

* revert changes in ipsetmanager, const.go, and dp.Remove/UpdatePolicy

* style: whitespace

* perf: use len() instead of creating slice from map

* remove verbosity for iptables bootup

* build: add return statement

* style: whitespace

* build: fix variable shadowing

* build: fix more import shadowing

* build: windows pointer issue and UT issue

* test: fix UT for iptables error code 2

* ci: enable linux scale test

* ci: revert to master pipeline.yaml

* revert changes to chain-management. do changes in PR #2012

* log: change wording

* test: UTs for netpol in background

* log: wording

* feat: apply ipsets for each netpol individually

* config: rearrange ConfigMap & update capz yaml

* fix: windows bootup phase logic for addpolicy

* feat: restrict netpol in background to linux + nftables

* test: skip nftables check for UT

* style: netpols[0] instead of loop

* log: address log comments

* style: lint for long line

---------

Co-authored-by: Vamsi Kalapala <vakr@microsoft.com>
2023-07-19 09:13:52 -07:00
Hunter Gregory 940a7a73d0
ci: [NPM] improve scale pipeline & fix edge case in scale script (#1975)
* ci: check if directory is empty before applying it

* ci: don't wait for pods if they weren't created

* docs: fix script name

* ci: wip for enabling linux scale test

* ci: parameters for linux vs windows

* ci: adjust params

* ci: fix bash typo

* ci: fix cp

* ci: fix npm url

* ci: increase max pods for linux nodepool

* ci: start building windows image again

* tmp: use apply netpol in background image

* Revert "tmp: use apply netpol in background image"

This reverts commit eff43c5439.

* refactor: use CLUSTER_NAME variable

* ci: require succeeded() for scale & conformance tests

* test: fix vars used in test-scale.sh checks

* ci: disable linux, reenable windows

* ci: increase sleep before waiting for NPM to start & log info when it doesn't

* ci: better log capture & remove command from other pipeline

* ci: do not get logs of npm on kwok nodes

* ci: do not get logs of npm on kwok nodes (part 2)
2023-06-09 10:00:03 -07:00
Hunter Gregory ce11a8da2b
ci: [NPM] scale test pipeline using KWOK (#1915)
* wip

* temporarily disable most conf runs

* update readme

* back to raw yamls and clone the branch to run scale test

* fix raw yaml URLs

* fix inline script

* fix length of rg name

* uncomment all conf again

* comment out everything unnecessary for testing

* remove commented out dependencies

* use master branch for pipeline

* label nodes

* multiple nodes

* uncomment rest of conformance pipeline (originally commented for testing)

* fix print out for time taken in test-connectivity.sh

* fix: run kwok command in background

* mkdir for kwok log

* try azure cli 1 to fix login error

* Revert "try azure cli 1 to fix login error"

This reverts commit f1671e3939.

* move scale test to new pipeline yaml

* remove scale test from conformance pipeline yaml

* revert name change for cyclonus job

* remove unnecessary image build and variable

* error codes and display names

* change sleep and wait for npm logic

* look at directory

* use pre-cloned repo

* fix directory path

* install kubectl first

* FIXME: comment out succeeded condition

* kubectl binary arg

* kubectl for scale test

* fix label selector

* fix kubectl path

* fix kubectl binary arg

* fix kwok, more steps

* FIXME: temporarily use custom fast image

* fix kwok pid and add comment

* 10m timeout for connectivity after crud

* fix kwok command invocation

* bump up timeouts for testing

* higher memory limit

* add note to connectivity script

* fix sed

* no need to curl npm yaml

* tmp: comment things out to test final step

* only check if kwok pods are running, not necessarily ready

* Revert "tmp: comment things out to test final step"

This reverts commit 7b21125ab1.

* update registry keys to fix HNS reliability

* update regkey code

* sleep to let NPM restart in case of bootup failure

* adaptive wait timeout

* change some errors to warnings

* log date

* make sure all pods are labeled

* delete and readd labels after deleting pods

* tmp: skip large scale up and connectivity check for testing

* fix overwrite arg

* rename tasks and uncomment things

* update command for updating reg key

* make timeout logic simpler

* back to reg add command for regkeys

* official timeouts instead of test values

* delete task updating registry keys and stop hardcoding npm image

* increase sleep
2023-05-18 09:57:21 -07:00