* wip: apply dirty NetPols every 500ms in Linux
* only build npm linux image
* fix: check for empty cache
* feat: toggle for netpol interval. default 500 ms
* ci: remove stages "build binaries" and "run windows tests"
* wip: max batched netpols (toggle-specified)
* ci: remove manifest build/push for win npm
* wip: handle ipset deletion properly and max batch for delete too
* fix: correct remove policy
* fix: only remove policy if it was in kernel
* finalize toggles, allowing ability to turn off iptablesInBackground
* ci: conf + cyc use PR's configmaps
* fix: lints
* fix dp toggle: iptablesInBackground
* fix lock typo and config logging
* fix background thread. add comments. only add tmp ref when enabled
* copy pod selector list
* fix: removepolicy needs namespace too
* rename opInfo to event
* fix: fix references and prevent concurrent map read/write
* tmp: debug logging
* fix: missing set references by swap keys and values
* Revert "tmp: debug logging"
This reverts commit 70ed34c714ea4a6d009a1fe90a7168be4bedd5bf.
* fix: add podSelectorList to fake NetPol
* log: do not print error when failing to delete non-existent nft rule
* log: verbose iptables bootup
* log: use fmt.Errorf for clean logging
* log: never return error for iptables in background and fix some lints
* fix: activate/deactivate azure chain rules
* fix: correctly decrement netpols in kernel
* ci: run UTs again
* ci: update profiles. default to placefirst=false
* address comment: rename batch to pendingPolicy
* refactor: make dirty cache OS-specific
* test: UTs
* test: put UT cfg back to placefirst to not break things
* ci: update cyclonus workflows
* fmt: address comment & lint
* fmt: rename numInKernel to policiesInKernel
* log: switch to fmt.Errorf
* fmt: whitespace
* feat: resiliency to errors while reconciling dirty netpols
* log: temporarily print everything for ipset restore
* fix: remove nomatch from ipset -D for cidr blocks
* test: UTs for non-happy path
* test: fix hns fake
* fix: don't change windows. let it delete ipsets when removing policies
* fix windows lint
* fix: ignore chain doesn't exist errors for iptables -D
* feat: latency and failure metrics
* test: update exit code for UT
* metrics: new metrics should go in node-metrics path
* style: simplify nesting
* style: move identical windows & linux code to shared file
* ci: remove v1 conformance and cyclonus
* feat: add NetPols in background from the DP (revert background code in pMgr)
* style: remove "background" from iptables metrics
* revert changes in ipsetmanager, const.go, and dp.Remove/UpdatePolicy
* style: whitespace
* perf: use len() instead of creating slice from map
* remove verbosity for iptables bootup
* build: add return statement
* style: whitespace
* build: fix variable shadowing
* build: fix more import shadowing
* build: windows pointer issue and UT issue
* test: fix UT for iptables error code 2
* ci: enable linux scale test
* ci: revert to master pipeline.yaml
* revert changes to chain-management. do changes in PR #2012
* log: change wording
* test: UTs for netpol in background
* log: wording
* feat: apply ipsets for each netpol individually
* config: rearrange ConfigMap & update capz yaml
* fix: windows bootup phase logic for addpolicy
* feat: restrict netpol in background to linux + nftables
* test: skip nftables check for UT
* style: netpols[0] instead of loop
* log: address log comments
* style: lint for long line
---------
Co-authored-by: Vamsi Kalapala <vakr@microsoft.com>
* Block wireserver connectivity from pods in cilium cluster
* commenting arm build image step
* enabling arm image build
* fixed wget cmd and added test for overlay
* fixed wget-job.yaml
* fixed scripts
* get job yaml for debugging on failure
* Updated scripts.
Included wireserver test for swift pipeline
Reverted pipeline.yaml changes
* update cilium agent to include init container which blocks wireserver access
* update cilium agent with init container which blocks wireserver access
* CNS to be able to generate dualstack overaly CNI conflist (#1981)
* fix: Eliminating duplicate lines
* ci: Add update permission for ciliumidentity
* fix: Parameterize Image Registry
add retry to nnc update during scaledown (#1970)
* add retry to nnc update during scaledown
Signed-off-by: Evan Baker <rbtr@users.noreply.github.com>
* test for panic in pool monitor
Signed-off-by: Evan Baker <rbtr@users.noreply.github.com>
---------
Signed-off-by: Evan Baker <rbtr@users.noreply.github.com>
fix: reserve 0th IP as gateway for overlay on Windows (#1968)
* fix: reserve 0th IP as gateway for overlay on Windows
* fix: allow gateway to be updated
ci: windows profile container image (#1988)
Always use 0 for NC version in Overlay (#1979)
always use 0 for NC version in overlay
Signed-off-by: Evan Baker <rbtr@users.noreply.github.com>
[Vnet Scale - CNS]: Flattening CIDR ranges for Node NNC to a list (#1921)
* Read secondary CIDRs from VnetScale NNC
* fix comment
* update comment
* For VnetScale mode, Use 1st IP for def gateway instead of 0th for windows
* fix/add import
* address pr comments
* add comments
* address pr comments
* wrap error
* fix typo
* fix UT
fix: [NPM] check if policy exists in case of nil pointer (#1974)
fix: check for nil first
ci: disable kube-proxy for test clusters (#1965)
* disable kube-proxy for byocni cluster creation
* test config mapping
* shell pwd
* use CURDIR
* check current directory
* test with repo root dir
* test azp format
* test azp format
* test azp format
* change e2e steps to remove kube proxy
* fix load test update args
* fix ns and rg in update
* update ciliume2e
* fix kubectl cmd in load test
* adding new targets for no kube proxy
* remove cluster update
* update overlay e2e
* test behavior of load test
* test grep for azure-cns
* look for container deployment
* testing
* restart node variable check
* update if condition
* add skip node case
---------
Co-authored-by: tamilmani1989 <tamanoha@microsoft.com>
perf: [WIN-NPM] fast bootup (#1900)
* wip
* wip2
* use other apply DP func
* address comment about if statement
* finish bootup for both DPs
* fix lint
* fix lint 2
* fix lint 3
* longer UT timeout and add missing UTs for apply in background
tool: [NPM] script to clean up iptable chains (#1978)
tool: script to clean up NPM iptable chains
feat: [WIN-NPM] metrics for latencies and failures (#1959)
* implement metrics
* add npm prefix
* rename windows files
* metrics pkg UTs
* allow reinitializing prometheus metrics
* fix: hns wrapper should not throw error for empty SetPolicy values
* test: metric UTs in dataplane
* fix: record list endpoint latency always
* remove flaky UT
* feat: metric for max ipset members
* fix lint
* fix lint 2
* fix build
* fix lint 3
* simplify conditionals and protect against maxMembers becoming negative
* remove bottom 4 histogram buckets. start at 16 ms
* reset metrics for ipset UTs
* style: don't check for windows dp in *_windows.go files
* build: remove unused import
* test: reset windows metrics in UT
Remove SSH port 22 rule from aks-engine clusters (#1983)
ci: change overlaye2e stage to cilium-overlay (#1997)
* renaming overlaye2e for cilium
* update display names for stages
Initial getHomeAZ 404 changes (#1994)
* initial getHomeAZ 404 changes
* treat 404 as success
* address comments
CNS to be able to generate dualstack overaly CNI conflist (#1981)
fix: Parameterize Image Registry
add retry to nnc update during scaledown (#1970)
* add retry to nnc update during scaledown
Signed-off-by: Evan Baker <rbtr@users.noreply.github.com>
* test for panic in pool monitor
Signed-off-by: Evan Baker <rbtr@users.noreply.github.com>
---------
Signed-off-by: Evan Baker <rbtr@users.noreply.github.com>
fix: reserve 0th IP as gateway for overlay on Windows (#1968)
* fix: reserve 0th IP as gateway for overlay on Windows
* fix: allow gateway to be updated
ci: windows profile container image (#1988)
Always use 0 for NC version in Overlay (#1979)
always use 0 for NC version in overlay
Signed-off-by: Evan Baker <rbtr@users.noreply.github.com>
[Vnet Scale - CNS]: Flattening CIDR ranges for Node NNC to a list (#1921)
* Read secondary CIDRs from VnetScale NNC
* fix comment
* update comment
* For VnetScale mode, Use 1st IP for def gateway instead of 0th for windows
* fix/add import
* address pr comments
* add comments
* address pr comments
* wrap error
* fix typo
* fix UT
fix: [NPM] check if policy exists in case of nil pointer (#1974)
fix: check for nil first
ci: disable kube-proxy for test clusters (#1965)
* disable kube-proxy for byocni cluster creation
* test config mapping
* shell pwd
* use CURDIR
* check current directory
* test with repo root dir
* test azp format
* test azp format
* test azp format
* change e2e steps to remove kube proxy
* fix load test update args
* fix ns and rg in update
* update ciliume2e
* fix kubectl cmd in load test
* adding new targets for no kube proxy
* remove cluster update
* update overlay e2e
* test behavior of load test
* test grep for azure-cns
* look for container deployment
* testing
* restart node variable check
* update if condition
* add skip node case
---------
Co-authored-by: tamilmani1989 <tamanoha@microsoft.com>
perf: [WIN-NPM] fast bootup (#1900)
* wip
* wip2
* use other apply DP func
* address comment about if statement
* finish bootup for both DPs
* fix lint
* fix lint 2
* fix lint 3
* longer UT timeout and add missing UTs for apply in background
tool: [NPM] script to clean up iptable chains (#1978)
tool: script to clean up NPM iptable chains
feat: [WIN-NPM] metrics for latencies and failures (#1959)
* implement metrics
* add npm prefix
* rename windows files
* metrics pkg UTs
* allow reinitializing prometheus metrics
* fix: hns wrapper should not throw error for empty SetPolicy values
* test: metric UTs in dataplane
* fix: record list endpoint latency always
* remove flaky UT
* feat: metric for max ipset members
* fix lint
* fix lint 2
* fix build
* fix lint 3
* simplify conditionals and protect against maxMembers becoming negative
* remove bottom 4 histogram buckets. start at 16 ms
* reset metrics for ipset UTs
* style: don't check for windows dp in *_windows.go files
* build: remove unused import
* test: reset windows metrics in UT
Remove SSH port 22 rule from aks-engine clusters (#1983)
ci: change overlaye2e stage to cilium-overlay (#1997)
* renaming overlaye2e for cilium
* update display names for stages
Initial getHomeAZ 404 changes (#1994)
* initial getHomeAZ 404 changes
* treat 404 as success
* address comments
CNS to be able to generate dualstack overaly CNI conflist (#1981)
* fix: File Directory
* style: Comments
* Addressing Comments
---------
Co-authored-by: Paul Johnston <35265851+pjohnst5@users.noreply.github.com>
* ci: check if directory is empty before applying it
* ci: don't wait for pods if they weren't created
* docs: fix script name
* ci: wip for enabling linux scale test
* ci: parameters for linux vs windows
* ci: adjust params
* ci: fix bash typo
* ci: fix cp
* ci: fix npm url
* ci: increase max pods for linux nodepool
* ci: start building windows image again
* tmp: use apply netpol in background image
* Revert "tmp: use apply netpol in background image"
This reverts commit eff43c5439.
* refactor: use CLUSTER_NAME variable
* ci: require succeeded() for scale & conformance tests
* test: fix vars used in test-scale.sh checks
* ci: disable linux, reenable windows
* ci: increase sleep before waiting for NPM to start & log info when it doesn't
* ci: better log capture & remove command from other pipeline
* ci: do not get logs of npm on kwok nodes
* ci: do not get logs of npm on kwok nodes (part 2)
* disable kube-proxy for byocni cluster creation
* test config mapping
* shell pwd
* use CURDIR
* check current directory
* test with repo root dir
* test azp format
* test azp format
* test azp format
* change e2e steps to remove kube proxy
* fix load test update args
* fix ns and rg in update
* update ciliume2e
* fix kubectl cmd in load test
* adding new targets for no kube proxy
* remove cluster update
* update overlay e2e
* test behavior of load test
* test grep for azure-cns
* look for container deployment
* testing
* restart node variable check
* update if condition
* add skip node case
---------
Co-authored-by: tamilmani1989 <tamanoha@microsoft.com>
* wip
* temporarily disable most conf runs
* update readme
* back to raw yamls and clone the branch to run scale test
* fix raw yaml URLs
* fix inline script
* fix length of rg name
* uncomment all conf again
* comment out everything unnecessary for testing
* remove commented out dependencies
* use master branch for pipeline
* label nodes
* multiple nodes
* uncomment rest of conformance pipeline (originally commented for testing)
* fix print out for time taken in test-connectivity.sh
* fix: run kwok command in background
* mkdir for kwok log
* try azure cli 1 to fix login error
* Revert "try azure cli 1 to fix login error"
This reverts commit f1671e3939.
* move scale test to new pipeline yaml
* remove scale test from conformance pipeline yaml
* revert name change for cyclonus job
* remove unnecessary image build and variable
* error codes and display names
* change sleep and wait for npm logic
* look at directory
* use pre-cloned repo
* fix directory path
* install kubectl first
* FIXME: comment out succeeded condition
* kubectl binary arg
* kubectl for scale test
* fix label selector
* fix kubectl path
* fix kubectl binary arg
* fix kwok, more steps
* FIXME: temporarily use custom fast image
* fix kwok pid and add comment
* 10m timeout for connectivity after crud
* fix kwok command invocation
* bump up timeouts for testing
* higher memory limit
* add note to connectivity script
* fix sed
* no need to curl npm yaml
* tmp: comment things out to test final step
* only check if kwok pods are running, not necessarily ready
* Revert "tmp: comment things out to test final step"
This reverts commit 7b21125ab1.
* update registry keys to fix HNS reliability
* update regkey code
* sleep to let NPM restart in case of bootup failure
* adaptive wait timeout
* change some errors to warnings
* log date
* make sure all pods are labeled
* delete and readd labels after deleting pods
* tmp: skip large scale up and connectivity check for testing
* fix overwrite arg
* rename tasks and uncomment things
* update command for updating reg key
* make timeout logic simpler
* back to reg add command for regkeys
* official timeouts instead of test values
* delete task updating registry keys and stop hardcoding npm image
* increase sleep
* update script to check cns in memory and add to pr pipeline
* adding stage to both overlay and podsubnet cilium stages
* add exit case if priveleged pod is not found
* check status of priv pod
* call ds status before exit
* install cilium ds with kubectl and not helm for systemd-networkd initcontainer patch
* upload cilium ds
* adding files for cilium-agent and cilium-operator deployment
* update cilium ds
* addressing comments
* update linux conformance binary
* temporarily comment out test profiles until one works
* Revert "temporarily comment out test profiles until one works"
This reverts commit db623d3833.
* undo change to git checkout for windows
* Updated aks-engine to unblock the pipeline.
* Use the latest aks-e dependancy.
* Update aks-e dependency to unblock acn pipeline and set retry to 0 to
get quick result.
* test repo trigger
* add resources label
* add repositories under resources
* test with endpoint
* testing with tag check
* typo in condition line
* evaulate version in condition
* compare tag vars
* test cancellation stage
* add job to cancel stage
* change name of cancel stage
* add pool and msg to cancel job
* change cancellation dependency
* add second tag validation stage
* spcaing ofvalidation1
* test ne and cancellation script
* check tag versions
* set currentTagx variable
* test new tag var
* fix value mapping
* check_tag job
* update current tag in manifest stage
* reading tags
* adjust variable assignment
* uncomment sleep
* read out different tags
* test validation dependency
* use succeeded/failed in condition check
* fix condition
* fix call to succeeded()
* omit condition on validation stages to read tags
* remove sleep, just validate condition
* fix error with submod swift naming
* add hack overlay make target and e2e templates
* update pipeline.yaml with new overlay stage
* use dropgz/overlay-byocni in acn pr and submodule pipeline
* name overlay clusters for submodule pipeline
* modify dropgz setup
* test ipam + overlay
* use azilium conflist
* add overlay configmap
* add deployment of ip-masq-agent to cluster
* set flag in ip-masq-agent deployment
* update deployment
* configmap creation
* parameterize and tidy up the hackfile
Signed-off-by: Evan Baker <rbtr@users.noreply.github.com>
* use region variable in pipelines when building clusters
Signed-off-by: Evan Baker <rbtr@users.noreply.github.com>
Signed-off-by: Evan Baker <rbtr@users.noreply.github.com>
* prevent chance of overlap in resource groups
* print npm pod state and capture previous logs
* fail cyclonus if it doesn't complete
* redirect to /dev/null
* remove incorrect use of --kubeconfig
* Separate the npm test and cni test, comment out npm test first. It's a
testing PR.
* Revert unnecessary change.
* update aks-e version from v1.0.1 to v1.0.2
* Update cluster definition from version 1.22.7 to 1.22.15
* Update resource group name.
* Comment out unnecessary resource group deletion since the
test-kubernetes already did this work.
* Update aks-engine version for acn validation. New version is using the
latest aks-engine release and remove duplicated and unnecessary npm
tests. Remove unnecessary resource clean up step in pipeline since it's
already included in the aks-e e2e test. Also update the kubnernetes version accordingly.
* [NPM][Windows] remove preview flag
* remove k8s version in one file
* remove k8s version in another file
* remove k8s version in third file
Co-authored-by: Hunter Gregory <42728408+huntergregory@users.noreply.github.com>
* add new ipam/dropgz e2e test stage
* set test image
* set paths in pipeline.yaml
* set path trigger in ipam-dropgz-job template
* set path as param for ipam-dropgze2e
* testing path trigger with branch
* create submodule pipeline
* add log in /azure-ipam to test piepline trigger
* remove build for cni-dropgz image in submodule pipeline
* create separate swift + cilium ipam/dropgz test stages
* remove commented test bits
* remove commented lines from setup_test.go
* update paths/triggers and set template parameters
* change cleanup dependsOn
* test parameter call
* export parameter
* skip setting testDropgz instead of setting false
* removing ipam-dropgz templates and focus submod UT on azure-ipam
* remove parsing gatewayIP from azure-ipam unit tests
* update make target test-azure-ipam
* update make target
* only publish test results
* omit npm build
* keep npm build and simplify unit testing stage
* adjust cluster naming to avoid resource overlap in pipelines
* add new dockerfile for testing cni-dropgz
* add make targets for cni-dropgz-test manifest
* modify versioning in cni-dropgz-test manifest
* platform versioning for test image
* fix build of test image
* update go version for dropgz
* building vnet-ipam and vnet-telemetry
* copy azure-vnet* to grab all vnet binaries
* modify build for vnet-ipam
* fix copying of cns conflist
* fix: Remove retry from Windows pipelines
* Adding in make command for windows NPM
* fixing an issue with retry
* adding a retry on task
* fix: remove sub powershell command for windows builds
* pipelines: Adding a new stage for NPM continous integration
* pipelines: Adding a new stage for NPM continous integration
* pipelines: Adding a new stage for NPM continous integration
* removing download phase
* removing download phase
* remove v1 default
* use submodule specific tags
Signed-off-by: Evan Baker <rbtr@users.noreply.github.com>
* support separate go submodule versions
Signed-off-by: Evan Baker <rbtr@users.noreply.github.com>
* move version and tag responsibilities to the makefile
Signed-off-by: Evan Baker <rbtr@users.noreply.github.com>
* update integration tests to use component tags
Signed-off-by: Evan Baker <rbtr@users.noreply.github.com>
Signed-off-by: Evan Baker <rbtr@users.noreply.github.com>
* set up conformance tests
* set path
* testing path
* test
* test
* test
* test
* list installs
* test
* check for cns pods
* skip ds deletion
* remove CNS daemonset deletion
Co-authored-by: Camryn Lee <camrynlee@microsoft.com>
* test cilium cluster installation
* change deploy arg
* test helm deploy
* fix helm args
* test
* test
* test helm install
* split swift/azilium tests
* test
* separate pipeline step for cilium cluster
* fix file name
* test
* test
* test
* test
* change img pull policy
* test
* test
* test iptable SNAT rules
* test configmap change
* fix newline spacing
* toggle iptable in cns-config
* fix cni config format
* test
* test
* test
* test
* Test configmap
* test
* mount xtables lock + set securityContext
* test
* test deploy args
* clean up
* clean up
* update cilium helm vals
* update step template cilium version
* test conflist copy
* Test
Co-authored-by: Camryn Lee <camrynlee@microsoft.com>
* use initContainers
* add suggested changes
* add vni-conflist to volumes
* test w diff region
* modify deploy args
* fix deploy args
Co-authored-by: Camryn Lee <camrynlee@microsoft.com>
* add windows cns manifest to multi arch image
* try to use generic windows template w/ containerize stage in pipeline
* try and use buildah to pull images
* update manifest build and push for buildah
* create manifest by referencing images instead of pulling to avoid OS mismatch error
* remove unused windows-image.yaml
* remove REGISTRY var and use IMAGE_REGISTRY from makefile
Co-authored-by: Jaeryn <tsun.chu@microsoft.com>
Co-authored-by: Evan Baker <rbtr@users.noreply.github.com>
* test logs
* wip
* add IS_STRESS_TEST to matrix
* finish
* Copy instead of move kubeconfig
* fix folder name
* Fix folder name again
* sleep and fix published folder
* should be done
* rearrange and exit properly
* test exit code
* finalized
* make num parallel jobs a global setting
* windows dualstack pipeline
* temporarily commented build images and tests
* fixed if else
* adding back change removed wrongly due to conflict
* updated artifact name
* updated rg name based on OS
* updated cluster name and artifacts
* e2e pipeline for linux dualstack
* separated into jobs
* fixed template
* indent fix
updated stage name
* disabled build image and run test
* added dependency for cleanup
* added cmds to debug
* updated clone step
* moved to task for deploying aks-engine cluster
* updated container image
* added debug log
* fixed cmd
* updated env rg
* using forked aks-engine version
* reverting the changes
* dnprefix to be auto generated
* updated dns prefix
* get one region
* reversed if-else
* updated node count
* mtest: run conformance for npm profiles in parallel
* update dependsOn field
* FIXME updating trigger for pipeline run for now
* Revert "FIXME updating trigger for pipeline run for now"
This reverts commit dce5ac2fbd.
* fix yaml compile error
* shorten cluster name because it was too long
* remove test for v1-place-azure-chain-first profile
* remove always() condition for cleanup and publish kubeconfig artifact
* rename kubeconfig artifact name
* try putting kubeconfig in the published npmlogs folder
* ipset fail if set doesn't exist when attempting to add to list
* bail out on add to set when ip is empty
* check set type when adding to list
* revert checking nil value
* fix test build issues
* additional ip check
* ip check with parseip
* prometheus count
* delete from set checks
* delete from set checks
* log on skipping pod
* logging pipeline
* npm logs ci
* first pass at decoupling resource maps
* First pass on decoupling resource maps
* Adding telemetry capabilities to resource CRUD events
* Initializing new maps in nprMgr for tests
* Initializing new maps in nprMgr for tests
* Adding artifact for Npm logs
* Addressing comments
* Addressing comments
* First pass at implementing pod cache
* handling namedports in case of pod update
* Correcting print error
* Cleaning up pod cache update event. moving pod cache to nsMAP
* Correcting namespace prefix
* Adding in checks on protlists and Podips
* changing some variable names
* changing some variable names
* Adding resource versions checks for Pod, NS and netpols
* fixing some tests
* changing ResourceVersion to uint64 and cleaning up oldpodobj references
* rearranging hostneptol and correcting a UT failure
* Fixing the hostnet pod UT
* Addressing comments
* fixing UT
* Fixing UTs
* correcting pod delete failure bug
* Fixing clean up bug
* Handling hostnet pods in Delete pod
* Addressing comments and ficing a panic error
* update cns tag in yaml
* pipeline cleanup
Co-authored-by: Matthew Long <Matthew.Long@microsoft.com>
Co-authored-by: Matthew Long <matlong@microsoft.com>
* update e2e to reuse pipeline artifacts
* install ginkgo
* update e2e config for apiserver
* add clean up step
* aks-engine version
* aks-engine version
* aks-engine version
* aks-engine version
* aks-engine version
* aks-engine version
* aks-engine version
* aks-engine version