* fix indexing for restore retries and log failed line
* flush and destroy in separate ipset restore calls
* restorer logging
* check for leaked ipsets
* update v1 and v2 to use iptables wait constant
* fix some UTs that have start error
* remove iptables reference check and member check
* tmp: skip flush for testing
* Revert "tmp: skip flush for testing"
This reverts commit 8c513062f9.
* fix lint
* clarify lint
* address comments
* address nit
* delete v2 chains in v1 and move code around
* leave v1 the same, and add v2 chains to the list to flush/destroy
* update go mod and package name
* Revert "update go mod and package name"
This reverts commit 46b674c440.
* rename folder
* address comments
* remove extra ioutil import statement
* update mod/vendoring
* Revert "update mod/vendoring"
This reverts commit 436897e6cb.
* back to original go.mod
* update build tags for ioutil files used in v1
* put jump from forward to azure-npm chain above the one to kube-services
* update unit test
* add toggle for chain position
* incorporate toggle in iptm and update UTs. v1 controller tests seem broken
* rename toggle name
* jump to azure chain on new ct state and update default toggle (UTs will break)
* make util constant for UTs and fix UT errors (besides ones I get for controllers)
* added missing module args for ctstate NEW
* reconcile jump to azure chain at top
* delete deprecated jump to azure chain on uninit, and fix go lint
* assign correct default toggle value
* addressed comments
* fix UTs after removing index 1 for placing chain first. Also make all tests subtests for check and add forward chain
* set PlaceAzureChainFirst: true
* switch to correct default for PlaceAzureChainFirst
* made prometheus exec time metrics for ipsets and iptables in line with those for network policies (exec time recorded even for failures). Also made prometheus timer variable names clearer.
* fixed faulty prometheus handler test looking for a node metric name when testing the cluster metric handler
* add clarity in comments related to the IPSetInventory metric
* Include prometheus metrics for lists and in DestroyNPMIpsets(). Only make metric updates when there's no error
* refactor prometheus testing and include metric tests for lists and NPMDestroyIpsets()
* better check for empty response to ipset list in DestroyNpmIpsets()
* remove unused clientset from controllers
* replace function for setting ipset inventory with function for removing ipset for better readability. updating comments too
* reset ipset inventory before each unit test
* added unit test for adding to set with pod cache
* remove unused cluster state function and clientset from np manager
* fix build problems: remove clientset from calls to npm.NewNetworkPolicyManager()
* fix logic for destroy ipsets for situation when destroy is called while num ipsets is 0
* delete commented out function
* encapsulated prometheus metrics, refactored prometheus testing for iptm and netpol controller, and removed clientset from controller creation in test files (fixing build error)
* update test for DestroyNpmIpsets() to always use a new Exec
* Locate key data structure in each controller and remove locks in each controller if possible and lower lock location into ipset manager
* Create npmNameSpaceCache to manage shared namespace objects and mutex
* Encapsulate listMap and setMap in ipset manager struct not to expose them to other packages.
Remove unnecessary codes and clean-up initialization codes.
* Encapsulate methods and members to avoid unintentional access to variables and manage better shared resource. Correct UTs and Clean-up codes
* Update expected values in UTs according to architectural change and clean-up code (remove unnecessary comment and duplicated logging)
* Add comments and clean-up codes (removing redundant codes, etc)
* Remove threadness variable to avoid unintentional increase in the number of workers in each controller without safe synchronization
* Handle errors and return values in right ways based on lint hints
* Remove error handling codes in initializing iptables and ipset when NPM starts
* Call a correct function to avoid UT failure
* Resolve comments
* Locate key data structure in each controller and remove locks in each controller if possible and lower lock location into ipset manager
* Create npmNameSpaceCache to manage shared namespace objects and mutex
* Encapsulate listMap and setMap in ipset manager struct not to expose them to other packages.
Remove unnecessary codes and clean-up initialization codes.
* Encapsulate methods and members to avoid unintentional access to variables and manage better shared resource. Correct UTs and Clean-up codes
* Update expected values in UTs according to architectural change and clean-up code (remove unnecessary comment and duplicated logging)
* Add comments and clean-up codes (removing redundant codes, etc)
* Remove threadness variable to avoid unintentional increase in the number of workers in each controller without safe synchronization
* Handle errors and return values in right ways based on lint hints
* Remove error handling codes in initializing iptables and ipset when NPM starts
* Call a correct function to avoid UT failure
* Resolve comments
* Correct lint's complaint
* Correct chain order
* resolved llc lint warnings and renamed variables
* Resolved lint warnings if possible
* Removed unnecessary variables and codes
* Locate key data structure in each controller and remove locks in each controller if possible and lower lock location into ipset manager
Encapsulate listMap and setMap in ipset manager struct not to expose them to other packages.
Remove unnecessary codes and clean-up initialization codes.
Encapsulate methods and members to avoid unintentional access to variables and manage better shared resource. Correct UTs and Clean-up codes
Update expected values in UTs according to architectural change and clean-up code (remove unnecessary comment and duplicated logging)
Add comments and clean-up codes (removing redundant codes, etc)
Handle errors and return values in right ways based on lint hints
Remove error handling codes in initializing iptables and ipset when NPM starts
Call a correct function to avoid UT failure
Resolve comments
Correct lint's complaint
Correct chain order
* Add custom encoding and decoding logic for NPMCache
* Revise UT case in server_test.go and resolve lint warning
* Add npm cache file which was revised based on custom encoding and revise corresponding UTs
* Add unit test for npmCache (need to remove redundancy in server_test.go)
* Resolve lint warnings
* Fix save and restore function to work properly with real exec
* Delete unused save and restore codes
* Clean-up deadcodes (i.e., save and restore) for ipset
* Remove an unused variable
* Returning in DROP chains
* adding a comment about future cleanup of chains:
* Removing duplicate rules in PORT chains of TO/FROM and DROP chain jumps
* Adding a image of new chains behavior
* Addressing comments
* addressing some comments
* Correcting some UTs to not have the jump rules
* removing jump flag
* merge conflict
* Adding in wait to get mitigate defunct process
* Adding error handling and addressing comments
* Addressing some comments
* Adding a testcase for GetlineNumber
* Adding error message in failures
* Adding error message in failures
* correcting the name for Kubeservices chain
This change will help evaluate both INGRESS and EGRESS rules before accepting/taking a decision on a packet. NPM will now MARK a packet for ingress/egress and RETURN the MARK'ed packet. Then packet will be accepted in the main chain after all the ingress and egress rules are processed.
* first pass trying to return instead of accept
* Adding initial marking capability
* Adding accept on ingress and egress marks
* Correcting an ingress marker
* Correcting unit test cases to show the appropriate markers
* Correcting a comment
* Addressing comments
* Accelerate metrics report from every 30 mins to every 5 mins.
* Add errCountTest metric.
* Refactor SendAiMetrics. AI initialization is in main routine while send metrics is in another go routine.
* Add aiMetadata config.
* Add SendErrorMetrics function in ai utils.
* Going to push error log to AI telemetry.
* Add error log to AI telemetry.
* Change error message format.
* Add error log and metrics to AI telemetry.
* Remove unnecessary const.
* Change heartbeat back to every 30 mins.
* Seperate send log from SendErrorMetric function for better reuse.
* Change a unit test set name to avoid kernel conflict.
* Address comments. Make error log and metrics sending more generic.
* Fix typo.
* Fix indentation.
* Fix AI initialize issue.
* Remove unnecessary log.
* Use break in if condition.
* prometheus additions to testmain (commented out right now)
* home of the npm prometheus metrics and tools for updating them, testing them
* add/remove policy metrics
* add/remove iptables rule metric measurements
* add/remove ipset metric measurements
* testing for gauges. want to soon remove the boolean for including prometheus in unit testing
* run http server that exposes prometheus from main
* cleaner test additions with less code
* removed incorrect instance of AddSet in the TestDeleteSet test
* added prometheus annotations to pod templates
* deleted unused file
* much more organized initialization of metrics now. now includes map from metric to metric name
* add ability to get summary count value. now getting gauge values and this new count value are done by passing the metric itself as a param instead of a string
* condenses prometheus testing code base by condensing all prometheus error messages into a function
* added testing for summary counts, condensed prometheus error handling code, and updated calls to use new form for getting metric values
* update based on variable spelling change in metrics package
* Added comments for functions and moved http handler code to the http file
* fixed problem of registering same metric name for different metrics, and passing in the wrong param type for testing
* made prometheus testing folder with interactive testing file. moved old random metric flux testing function over from ipsm_test
* moved testing around again
* fixed spelling mistake
* counting mistake in unit test
* handler variable ws in wrong file. Changed stdout printing to logging
* fixed parameter errors and counting error in a test
* moved utilities for testing prometheus metrics to npm/util. Updated StartHTTP to have an additional parameter for waiting after starting the server
* updated uses of StartHTTP to have the extra parameter
* updated GetValue and GetCountValue uses to use the prometheus features of the util package, which is now moved to a promutil package within npm/metrics/
* removed unnecessary comments, removed print statement, and added quantiles to all summary metrics
* fixed problem of double registering metrics
* wait longer for http server to start
* moved tool in test-util.go to promutil/util.go
* fixed timer to be in milliseconds and updated metric descriptions to mention units
* removed unnecessary comments
* http server always started in a go routine now. Added comment justifying the use of an http server
* debugging http connection refused in pipeline
* fixed syntax error
* removed debugging wrapper around http service
* sleep so that the testing metrics endpoint can be pinged
* redesigned GetValue and GetCountValue so that they don't use http calls
* removed random but helpful testing file - will write about quick testing in a wiki page
* milliseconds were being truncated. now they have decimals
* use direct Prometheus metric commands instead of wrapping them
* removed code used when testing was done through http server. Moved registering to metric creation functions
* added createGaugeVec, updated comments, made all help strings constants
* added metric that counts number of entries in each ipset. still need to add tests
* fixed creation of GaugeVecs, and use explicit labeling instead of order-based labeling now
* updated GetVecValue method signature
* added set to metrics on creation and wrote unit tests for CreateSet, AddToSet, DeleteFromSet, DeleteSet
* use custom registry to limit content that Container Insights scrapes. Also log the start of http server
* wrote TODO item comments for Restore and Destroy (currently these functions are only used in testing)
* NPM won't crash if a Prometheus metric fails to register now (unlikely). Added logging for metric registration/creation, and explicit public function to initialize metrics so that we can finish log config first
* initialize metrics in unit tests
* renamed util.go to test-util.go
Co-authored-by: Hunter Gregory <t-hugreg@microsoft.com>
* Move AZURE-NPM chain under KUBE-SERVICES chain; Move default allow CONNECTED/RELATED entry to the end of AZURE-NPM chain.
* Find index of KUBE-SERVICES chain.
* Remove old npm chains which were causing errors on uninit
* Utilize rawNpMap and refrain from updating policies with no change.
* redacted
* add added policy to processedNpMap
* give precedence to drop rules (over allow)
* - Moving kube-system-chain above target-sets-chain
- Add drop entry at the end of Ingress-From and Egress-To chains when there are non Allow-All* entries
* write logs to stdout (and log file) so that we can see logs via kubectl
* removing kube-system chain and fixing tests
* removing telemetry buffer
* change telemetry to message queue and add npm
* remove [Azure-NPM] prefix
* remove npmreport url
* fair scheduling
* holds up to 1k reports for each type
* fix cap on reports