Граф коммитов

41 Коммитов

Автор SHA1 Сообщение Дата
Hunter Gregory 2460582541
scale: [NPM] ignore leaked ipset references (#1502)
* fix indexing for restore retries and log failed line

* flush and destroy in separate ipset restore calls

* restorer logging

* check for leaked ipsets

* update v1 and v2 to use iptables wait constant

* fix some UTs that have start error

* remove iptables reference check and member check

* tmp: skip flush for testing

* Revert "tmp: skip flush for testing"

This reverts commit 8c513062f9.

* fix lint

* clarify lint

* address comments

* address nit
2022-08-09 11:39:13 -07:00
Hunter Gregory 26a4b6571e
feat: [NPM] include NPM v1/v2 in telemetry and fix heartbeat log (#1266)
* include NPM v1/v2 in telemetry

* fix heartbeat
2022-03-08 10:39:16 -08:00
Hunter Gregory 1ea2f5a745
feat: [NPM] num ACL rules for v2 & update existing metrics (#1223)
* wip

* fix windows build err

* address comments

* fix lingering merge conflicts
2022-02-09 20:38:46 -08:00
Hunter Gregory c820189a2f
feat: [NPM] send more AI logs (#1230)
* send heartbeat log and send logs in v2

* address comment and add logs for ip validation and policy manager bootup
2022-02-09 15:28:05 -08:00
Hunter Gregory 9d8e61e7ae
feat: [NPM] Forward Compatibility: delete v2 chains in v1, and Restructure Code (#1200)
* delete v2 chains in v1 and move code around

* leave v1 the same, and add v2 chains to the list to flush/destroy

* update go mod and package name

* Revert "update go mod and package name"

This reverts commit 46b674c440.

* rename folder

* address comments

* remove extra ioutil import statement

* update mod/vendoring

* Revert "update mod/vendoring"

This reverts commit 436897e6cb.

* back to original go.mod

* update build tags for ioutil files used in v1
2022-01-27 12:36:50 -08:00
Hunter Gregory b69f8b6c20
fix: [NPM] detect multi-digit numbers when getting iptables line number (#1199)
* fix: detect multi-digit line number

* fix bug in v1
2022-01-24 11:10:08 -08:00
Hunter Gregory 389f4ce3e5
fix: [NPM] remove print statement in v1 and update testpolicy names (#1184)
* rename two policies

* remove print statement in v1
2022-01-04 10:56:57 -08:00
Hunter Gregory db3c7065bb
fix: [NPM] reposition iptables jump to AZURE-NPM chain (#1086)
* put jump from forward to azure-npm chain above the one to kube-services

* update unit test

* add toggle for chain position

* incorporate toggle in iptm and update UTs. v1 controller tests seem broken

* rename toggle name

* jump to azure chain on new ct state and update default toggle (UTs will break)

* make util constant for UTs and fix UT errors (besides ones I get for controllers)

* added missing module args for ctstate NEW

* reconcile jump to azure chain at top

* delete deprecated jump to azure chain on uninit, and fix go lint

* assign correct default toggle value

* addressed comments

* fix UTs after removing index 1 for placing chain first. Also make all tests subtests for check and add forward chain

* set PlaceAzureChainFirst: true

* switch to correct default for PlaceAzureChainFirst
2021-11-16 15:30:34 -08:00
Hunter Gregory 0dd10e4e89
NPM Prometheus Update (#986)
* made prometheus exec time metrics for ipsets and iptables in line with those for network policies (exec time recorded even for failures). Also made prometheus timer variable names clearer.

* fixed faulty prometheus handler test looking for a node metric name when testing the cluster metric handler

* add clarity in comments related to the IPSetInventory metric

* Include prometheus metrics for lists and in DestroyNPMIpsets(). Only make metric updates when there's no error

* refactor prometheus testing and include metric tests for lists and NPMDestroyIpsets()

* better check for empty response to ipset list in DestroyNpmIpsets()

* remove unused clientset from controllers

* replace function for setting ipset inventory with function for removing ipset for better readability. updating comments too

* reset ipset inventory before each unit test

* added unit test for adding to set with pod cache

* remove unused cluster state function and clientset from np manager

* fix build problems: remove clientset from calls to npm.NewNetworkPolicyManager()

* fix logic for destroy ipsets for situation when destroy is called while num ipsets is 0

* delete commented out function

* encapsulated prometheus metrics, refactored prometheus testing for iptm and netpol controller, and removed clientset from controller creation in test files (fixing build error)

* update test for DestroyNpmIpsets() to always use a new Exec
2021-09-10 15:53:58 -07:00
Evan Baker 96bec09d41
chore: appease the linter (3/?), the big gofumpt (#987)
* gofumpt -w -s .

* small addtl cleanups after gofumpt

* rerun after rebase
2021-09-02 16:33:18 -05:00
Evan Baker 1087201b28
chore: appease the linter, pt 2 of ? (#925) 2021-09-01 18:28:17 -05:00
JungukCho 83b6240933
[NPM] Make each controller in npm independent (#949)
* Locate key data structure in each controller and remove locks in each controller if possible and lower lock location into ipset manager

* Create npmNameSpaceCache to manage shared namespace objects and mutex

* Encapsulate listMap and setMap in ipset manager struct not to expose them to other packages.
Remove unnecessary codes and clean-up initialization codes.

* Encapsulate methods and members to avoid unintentional access to variables and manage better shared resource. Correct UTs and Clean-up codes

* Update expected values in UTs according to architectural change and clean-up code (remove unnecessary comment and duplicated logging)

* Add comments and clean-up codes (removing redundant codes, etc)

* Remove threadness variable to avoid unintentional increase in the number of workers in each controller without safe synchronization

* Handle errors and return values in right ways based on lint hints

* Remove error handling codes in initializing iptables and ipset when NPM starts

* Call a correct function to avoid UT failure

* Resolve comments

* Locate key data structure in each controller and remove locks in each controller if possible and lower lock location into ipset manager

* Create npmNameSpaceCache to manage shared namespace objects and mutex

* Encapsulate listMap and setMap in ipset manager struct not to expose them to other packages.
Remove unnecessary codes and clean-up initialization codes.

* Encapsulate methods and members to avoid unintentional access to variables and manage better shared resource. Correct UTs and Clean-up codes

* Update expected values in UTs according to architectural change and clean-up code (remove unnecessary comment and duplicated logging)

* Add comments and clean-up codes (removing redundant codes, etc)

* Remove threadness variable to avoid unintentional increase in the number of workers in each controller without safe synchronization

* Handle errors and return values in right ways based on lint hints

* Remove error handling codes in initializing iptables and ipset when NPM starts

* Call a correct function to avoid UT failure

* Resolve comments

* Correct lint's complaint

* Correct chain order

* resolved llc lint warnings and renamed variables

* Resolved lint warnings if possible

* Removed unnecessary variables and codes

* Locate key data structure in each controller and remove locks in each controller if possible and lower lock location into ipset manager

Encapsulate listMap and setMap in ipset manager struct not to expose them to other packages.
Remove unnecessary codes and clean-up initialization codes.

Encapsulate methods and members to avoid unintentional access to variables and manage better shared resource. Correct UTs and Clean-up codes

Update expected values in UTs according to architectural change and clean-up code (remove unnecessary comment and duplicated logging)

Add comments and clean-up codes (removing redundant codes, etc)

Handle errors and return values in right ways based on lint hints

Remove error handling codes in initializing iptables and ipset when NPM starts

Call a correct function to avoid UT failure

Resolve comments

Correct lint's complaint

Correct chain order

* Add custom encoding and decoding logic for NPMCache

* Revise UT case in server_test.go and resolve lint warning

* Add npm cache file which was revised based on custom encoding and revise corresponding UTs

* Add unit test for npmCache (need to remove redundancy in server_test.go)

* Resolve lint warnings
2021-08-27 16:05:40 -07:00
JungukCho 6502e78952
[NPM] Remove unused codes to save and restore ipset and iptables (#952)
* Fix save and restore function to work properly with real exec

* Delete unused save and restore codes

* Clean-up deadcodes (i.e., save and restore) for ipset

* Remove an unused variable
2021-08-03 14:29:03 -07:00
Mathew Merrick 9b24dbd95a
test: [NPM] Use fakeexec for ipsm and iptm tests (#868)
* iptmgr

* more iptm testing

* grep call

* progress

* progress

* ipsm

* ioshim

* update tests

* package restructure

* fix broken test and delint

* reduce scope of ioshim

* reduce scope of ioshim

* ioshim scope

* require no error, retrigger ci

* ut return multiple results

* fix tests from master changes

* unexport ioshim

* update ut

* fix tests

* vendor

* test fix

* go version

* go version

* pipeline fixes

* fix tests
2021-07-14 12:53:45 -07:00
Vamsi Kalapala c2b2db1042
[NPM] RETURN early on MARK in DROP chains (#881)
* Returning in DROP chains

* adding a comment about future cleanup of chains:

* Removing duplicate rules in PORT chains of TO/FROM and DROP chain jumps

* Adding a image of new chains behavior

* Addressing comments

* addressing some comments

* Correcting some UTs to not have the jump rules

* removing jump flag
2021-06-15 10:06:37 -07:00
Mathew Merrick 1fa243e5f5
CI: Add golint-ci (#888)
* add golint-ci

* add gofmt

* enable linters

* uncap count

* fix linting/fmt issues
2021-06-01 16:58:56 -07:00
Vamsi Kalapala 37993588e4
[NPM] Clearing NPM MARKs on packets before accepting them (#823)
* First pass at creating a new chain to accept and clear marks npm added

* Adding rules to defualt chains
2021-03-18 16:16:00 -07:00
Mathew Merrick d169929048
Npm debug tools (#817)
* add inital debug tools

* export member variables for debug api

* add dependencies

* update metrics and tests

* remove refactor artifacts
2021-03-11 11:47:34 -08:00
Vamsi Kalapala 08f0006dda
[NPM] Default-Deny all bug fix with design change (#799)
* Changed init NPM chains with new chains

* Rearranging target sets

* Fixing UTs

* Fixing UTs

* Fixing UTs
2021-02-26 15:43:12 -08:00
Vamsi Kalapala effc4294d3
[NPM] Consolidating default chains and rules for easier readability (#796)
* Consolidating default chains and rules for easier readability

* Adding tests and checks for allchainsandrules

* [build] Update pipeline and enable debug symbols in bins (#793)

* add symbols, add gcflags, update timeouts, only show azure core errors

Co-authored-by: Mathew Merrick <matmerr@users.noreply.github.com>
2021-02-23 17:14:29 -08:00
Vamsi Kalapala 55f53d5abc
[NPM] Periodic Reconciliation of NPM Chain ordering in FORWARD table (#787)
* merge conflict

* Adding in wait to get mitigate defunct process

* Adding error handling and addressing comments

* Addressing some comments

* Adding a testcase for GetlineNumber

* Adding error message in failures

* Adding error message in failures

* correcting the name for Kubeservices chain
2021-02-18 21:10:37 -08:00
Vamsi Kalapala a1f13a8eea
NPM adhering to both ingress and egress rules (#765)
This change will help evaluate both INGRESS and EGRESS rules before accepting/taking a decision on a packet. NPM will now MARK a packet for ingress/egress and RETURN the MARK'ed packet. Then packet will be accepted in the main chain after all the ingress and egress rules are processed.

* first pass trying to return instead of accept

* Adding initial marking capability

* Adding accept on ingress and egress marks

* Correcting an ingress marker

* Correcting unit test cases to show the appropriate markers

* Correcting a comment

* Addressing comments
2021-01-25 12:33:26 -08:00
shchen 0835cae2d1
Change AI log and metrics sending function name in NPM. (#737) 2020-11-23 23:14:31 -08:00
shchen 1330e4aa3b
Add error log and metrics to AI telemetry. (#656)
* Accelerate metrics report from every 30 mins to every 5 mins.

* Add errCountTest metric.

* Refactor SendAiMetrics. AI initialization is in main routine while send metrics is in another go routine.

* Add aiMetadata config.

* Add SendErrorMetrics function in ai utils.

* Going to push error log to AI telemetry.

* Add error log to AI telemetry.

* Change error message format.

* Add error log and metrics to AI telemetry.

* Remove unnecessary const.

* Change heartbeat back to every 30 mins.

* Seperate send log from SendErrorMetric function for better reuse.

* Change a unit test set name to avoid kernel conflict.

* Address comments. Make error log and metrics sending more generic.

* Fix typo.

* Fix indentation.

* Fix AI initialize issue.

* Remove unnecessary log.

* Use break in if condition.
2020-09-04 10:57:37 -07:00
Hunter Gregory 88ea3c2acd
Prometheus metrics (#590)
* prometheus additions to testmain (commented out right now)

* home of the npm prometheus metrics and tools for updating them, testing them

* add/remove policy metrics

* add/remove iptables rule metric measurements

* add/remove ipset metric measurements

* testing for gauges. want to soon remove the boolean for including prometheus in unit testing

* run http server that exposes prometheus from main

* cleaner test additions with less code

* removed incorrect instance of AddSet in the TestDeleteSet test

* added prometheus annotations to pod templates

* deleted unused file

* much more organized initialization of metrics now. now includes map from metric to metric name

* add ability to get summary count value. now getting gauge values and this new count value are done by passing the metric itself as a param instead of a string

* condenses prometheus testing code base by condensing all prometheus error messages into a function

* added testing for summary counts, condensed prometheus error handling code, and updated calls to use new form for getting metric values

* update based on variable spelling change in metrics package

* Added comments for functions and moved http handler code to the http file

* fixed problem of registering same metric name for different metrics, and passing in the wrong param type for testing

* made prometheus testing folder with interactive testing file. moved old random metric flux testing function over from ipsm_test

* moved testing around again

* fixed spelling mistake

* counting mistake in unit test

* handler variable ws in wrong file. Changed stdout printing to logging

* fixed parameter errors and counting error in a test

* moved utilities for testing prometheus metrics to npm/util. Updated StartHTTP to have an additional parameter for waiting after starting the server

* updated uses of StartHTTP to have the extra parameter

* updated GetValue and GetCountValue uses to use the prometheus features of the util package, which is now moved to a promutil package within npm/metrics/

* removed unnecessary comments, removed print statement, and added quantiles to all summary metrics

* fixed problem of double registering metrics

* wait longer for http server to start

* moved tool in test-util.go to promutil/util.go

* fixed timer to be in milliseconds and updated metric descriptions to mention units

* removed unnecessary comments

* http server always started in a go routine now. Added comment justifying the use of an http server

* debugging http connection refused in pipeline

* fixed syntax error

* removed debugging wrapper around http service

* sleep so that the testing metrics endpoint can be pinged

* redesigned GetValue and GetCountValue so that they don't use http calls

* removed random but helpful testing file - will write about quick testing in a wiki page

* milliseconds were being truncated. now they have decimals

* use direct Prometheus metric commands instead of wrapping them

* removed code used when testing was done through http server. Moved registering to metric creation functions

* added createGaugeVec, updated comments, made all help strings constants

* added metric that counts number of entries in each ipset. still need to add tests

* fixed creation of GaugeVecs, and use explicit labeling instead of order-based labeling now

* updated GetVecValue method signature

* added set to metrics on creation and wrote unit tests for CreateSet, AddToSet, DeleteFromSet, DeleteSet

* use custom registry to limit content that Container Insights scrapes. Also log the start of http server

* wrote TODO item comments for Restore and Destroy (currently these functions are only used in testing)

* NPM won't crash if a Prometheus metric fails to register now (unlikely). Added logging for metric registration/creation, and explicit public function to initialize metrics so that we can finish log config first

* initialize metrics in unit tests

* renamed util.go to test-util.go

Co-authored-by: Hunter Gregory <t-hugreg@microsoft.com>
2020-07-14 19:41:02 -04:00
neaggarwMS cf25cd318a
Update NPM logs to use Logf instead of Printf (#601) 2020-07-06 10:57:44 -07:00
Jaeryn d3db72e1f7
Re-order NPM IpTable Chains/Rules (#572)
* Move AZURE-NPM chain under KUBE-SERVICES chain; Move default allow CONNECTED/RELATED entry to the end of AZURE-NPM chain.

* Find index of KUBE-SERVICES chain.
2020-06-02 15:45:46 -07:00
Jaeryn ef14c8d186
Fix a bug that created a random egress entry when deleting from ingress-centric policy group. Also added a check to add multiple default entries to prevent deletion of once policy to affect another. (#551) 2020-04-29 14:32:22 -07:00
Jaeryn e98b789916
Fix NPM Bugs (#542)
* Remove old npm chains which were causing errors on uninit

* Utilize rawNpMap and refrain from updating policies with no change.

* redacted

* add added policy to processedNpMap
2020-04-10 17:54:24 -07:00
Jaeryn e6c4e77476
append jump ipt entries and prepend the rest (#468) 2020-01-02 16:19:58 -08:00
jaer-tsun 61ccd6f387
Fix NPM Regression & Remove TelemetryBuffer Sidecar (#449)
* give precedence to drop rules (over allow)

* - Moving kube-system-chain above target-sets-chain
- Add drop entry at the end of Ingress-From and Egress-To chains when there are non Allow-All* entries

* write logs to stdout (and log file) so that we can see logs via kubectl

* removing kube-system chain and fixing tests

* removing telemetry buffer
2019-12-05 13:58:11 -08:00
Yongli Chen ff7b9c0801 clear stale states on creation (#411)
* clear stale states on creation

* log first UninitNpmChains() call

* adding kube system namespace after reboot
2019-09-30 19:09:49 -04:00
Mathew Merrick c44e775344
NPM test fixes + Azure Pipelines (#402)
Initial Azure Pipelines config, run pipeline tests in container, CNM and NPM fixes
2019-09-20 16:15:08 -07:00
Yongli Chen 36f188cfc9
Merge policies (#390) 2019-08-30 14:02:45 -07:00
Yongli Chen f6bff8e3af
change telemetry to message queue and add npm (#366)
* change telemetry to message queue and add npm

* remove [Azure-NPM] prefix

* remove npmreport url

* fair scheduling

* holds up to 1k reports for each type

* fix cap on reports
2019-06-19 14:08:35 -07:00
tamilmani1989 313878e8c3
added default wait timeot to 60 seconds to get iptable lock (#343) 2019-04-24 16:34:58 -07:00
tamilmani1989 9cfec88252
add following support: (#342)
1. ip6table cmd support
2. wiat falg with timeout
2019-04-24 16:04:37 -07:00
Yongli Chen 97d3ddc13c
Honoring xtables lock (#315) 2019-03-20 14:25:53 -07:00
Yongli Chen d8d848fe24
Support new network policy definition since Kubernetes version 1.11 (#307) 2019-03-01 15:46:17 -08:00
Yongli Chen 434ddf768f
LIFO order for network policies (#258)
* reverseOrder

* add npm to circleCI
2018-10-12 15:40:45 -07:00
Yongli Chen 6a0f9ff207
Add Azure-npm to provide k8s network policy support (#173)
* address comments

* make azure-npm versiona consistent with main version

* Npm telemetry (#12)

* azure-npm

* set logging file

* parameterize telemetry API

* avoid null ptr derefence

* add telemetry to npm

* address comments

* add descriptive comments

* add copyright info

* returns on err
2018-07-19 14:23:11 -07:00