* Move network utils functions with iptables to new file
* Add receiver to iptables and create interface
* Resolve conflicts from rebasing
* Add changes for building on windows
* Address linter issues
* Address windows linter issues
* Invert if condition for linter nesting
* Scope iptables interfaces to package
* Rename iptables client to avoid stuttering
* Move EnableIPForwarding to snat linux
* Rename ipTablesClientInterface to ipTablesClient
* Address linter issues from moving enable ip forwarding function
* Rename after rebase
* feat: 🌈 StatelessCNI: Adding getEndpoint and UpdateEndpoint API to CNS (#2102)
* Adding getEndpoint and UpdateEndpoint API to CNS with the respective clients in support of stateless CNI.
* Updating the unit tests and address the comments.
* Addressing the comments.
* Addressing the coments regarding CNS support for Stateless CNI
* Adddressing the PR comments
* 🌈 feat: adding flags for stateless cni (#2103)
feat: stateless cni
* 🌈 feat: StatelessCNI: Applying stateless CNI mode changes in network package. (#2197)
* Apllying stateless CNI mode in network package.
* Addresing the commetns.
* feat: create stateless cni binary for swift (#2275)
* enabling CNS telemetry
* Master rebase changes
* CNI Telemetry enabled on CNS
* Stateless CNI changes.
* making change to CNSendpointStorePath
* Updating makefile to avoid creating statless CNI release.
---------
Co-authored-by: Vipul Singh <vipul21sept@gmail.com>
* Add vm and vnet ns block wireserver port 80 rule
* Use existing variable for known ip
* Move code to networkutils
* Address feedback
* Address iptables version feedback
* Address protocol and format feedback
* Add comments
* Remove cidr in case ipv6 is used
* Make initial changes to enable forwarding
Reuse existing code to enable ip forwarding
Add forwarding message
Add transparent vlan config to dropgz
Add enable ip forward to each network namespace for redundancy
* Add general run with retries function
* Migrate existing code to use retry function
* Add retries for checking of vnet and container veth creation
* Add preliminary repair item fixes
* Clarify log message
* Fix unit tests, move ensure clean populate vm to add endpoints, remove ip forwarding fix as it is in other branch
* Move setting link netns and confirmation to function
* Add tests for setLinkNetNSAndConfirm
* Address linter issues
* Add vm ns to vnet namespace log statement
* Fix after rebasing in swift refactoring changes from master
* Address feedback and check error message
* Address feedback
* Address linter issue
* Address log feedback
* Address feedback by asserting any error is an interface not found error
* Enable ipv4 forwarding on network creation
* Add multitenancy transparent vlan conflist to dropgz
* Test if applying fix each time works
* Address linter issues
* Revert "Test if applying fix each time works"
This reverts commit 8989dedc2e.
* Remove overlap in adding dropgz conflist
* Add unit test if forwarding fails
* Make error handling consistent with ipv6 forwarding
* Address linter issue
* wip: apply dirty NetPols every 500ms in Linux
* only build npm linux image
* fix: check for empty cache
* feat: toggle for netpol interval. default 500 ms
* ci: remove stages "build binaries" and "run windows tests"
* wip: max batched netpols (toggle-specified)
* ci: remove manifest build/push for win npm
* wip: handle ipset deletion properly and max batch for delete too
* fix: correct remove policy
* fix: only remove policy if it was in kernel
* finalize toggles, allowing ability to turn off iptablesInBackground
* ci: conf + cyc use PR's configmaps
* fix: lints
* fix dp toggle: iptablesInBackground
* fix lock typo and config logging
* fix background thread. add comments. only add tmp ref when enabled
* copy pod selector list
* fix: removepolicy needs namespace too
* rename opInfo to event
* fix: fix references and prevent concurrent map read/write
* tmp: debug logging
* fix: missing set references by swap keys and values
* Revert "tmp: debug logging"
This reverts commit 70ed34c714ea4a6d009a1fe90a7168be4bedd5bf.
* fix: add podSelectorList to fake NetPol
* log: do not print error when failing to delete non-existent nft rule
* log: verbose iptables bootup
* log: use fmt.Errorf for clean logging
* log: never return error for iptables in background and fix some lints
* fix: activate/deactivate azure chain rules
* fix: correctly decrement netpols in kernel
* ci: run UTs again
* ci: update profiles. default to placefirst=false
* address comment: rename batch to pendingPolicy
* refactor: make dirty cache OS-specific
* test: UTs
* test: put UT cfg back to placefirst to not break things
* ci: update cyclonus workflows
* fmt: address comment & lint
* fmt: rename numInKernel to policiesInKernel
* log: switch to fmt.Errorf
* fmt: whitespace
* feat: resiliency to errors while reconciling dirty netpols
* log: temporarily print everything for ipset restore
* fix: remove nomatch from ipset -D for cidr blocks
* test: UTs for non-happy path
* test: fix hns fake
* fix: don't change windows. let it delete ipsets when removing policies
* fix windows lint
* fix: ignore chain doesn't exist errors for iptables -D
* feat: latency and failure metrics
* test: update exit code for UT
* metrics: new metrics should go in node-metrics path
* style: simplify nesting
* style: move identical windows & linux code to shared file
* ci: remove v1 conformance and cyclonus
* feat: add NetPols in background from the DP (revert background code in pMgr)
* style: remove "background" from iptables metrics
* revert changes in ipsetmanager, const.go, and dp.Remove/UpdatePolicy
* style: whitespace
* perf: use len() instead of creating slice from map
* remove verbosity for iptables bootup
* build: add return statement
* style: whitespace
* build: fix variable shadowing
* build: fix more import shadowing
* build: windows pointer issue and UT issue
* test: fix UT for iptables error code 2
* ci: enable linux scale test
* ci: revert to master pipeline.yaml
* revert changes to chain-management. do changes in PR #2012
* log: change wording
* test: UTs for netpol in background
* log: wording
* feat: apply ipsets for each netpol individually
* config: rearrange ConfigMap & update capz yaml
* fix: windows bootup phase logic for addpolicy
* feat: restrict netpol in background to linux + nftables
* test: skip nftables check for UT
* style: netpols[0] instead of loop
* log: address log comments
* style: lint for long line
---------
Co-authored-by: Vamsi Kalapala <vakr@microsoft.com>
* set constant mac for host veth interface
* fixed a race issue in transparent-vlan where delete can happen after add and removes route add by ADD call
* moved log to place where its executed
* enable proxy arp on bridge to allow public connectivity from apipa interface
* validate newly created namespace is not same as host namespace
* addressed comments and added UTs
* fixed cni delete call for linux multitenancy
* lint fixes
* windows lint fixes
* lint fixes
* fix issues with network namespace creation and vlan interface creation
* Removed deletehostveth flag and delete host veth on delete endpoint trigger
* lint fix
* address comment
* deletion of host veth interface on error in transparent_vlan mode
* fixed a typo character
* Fix lint issues
* Windows lint fix
* Windows lint fix
* add netio import
* Fixed newly added ut
* reverting pre-push change
* fix a typo
* set kubeconfig on capz
* update dockerfile
* test network name Calico
* add base acls
* add WindowsNetworkName toggle and revert hard coded Calico parts
* update base acls for calico and add UTs
* capitalize calico network name
* fix connectivity. try with host allow acls
* revert change to policy_windows.go
* more UTs and add base ACLs for other "new endpoint" scenario
* run all UTs
* update npm image to .42
* add log line
* allow traffic going inter-node
* Revert "allow traffic going inter-node"
This reverts commit e1014822d5.
* add long-runner pod for testing vfp tags in capz
* fix lints
* Moving the lock from InitializeKeyValueStore() function to restore/save functions to improve cni performance on windows.
* fix: use defer function to unlock statefile.
* fix: fixing the IPAM lock and defer func
* fix: Optimizing cni file lock by moving SetSdnRemoteArpMacAddress() on startup for CRD and MultitenantCRD mode.
* adding store lock on telemetry service start to avoid race condition on windows.
* wip with StrictlyHasSetPolicies approach
* better approaching of getting all set policies
* wip for rigorous win dp UTs
* marshal setpolicies in hns mock and dont short circuit in UTs
* policy stuff and update test cases
* marshal ACLs in hns mock
* more UTs and minor refinements
* option to apply dp or not
* address cmp.Equal and t.Helper comments
* dpEvent returns error and better defined concurrency
* remove unnecessary logic in concurrent test code
* approach #3 emulating cyclonus
* namespace method for podmetadata
* refactor Action structure and TestCase wait group behavior
* hnsactions and renaming a file
* refactor to Serial and ThreadedTestCase structs, and move files to dp pkg
* hns latency hard coded to be the same for all threaded test cases
* fix build error after rebasing
* export fake hns network id
* address comments on multierr and terminology
* add comment about pod metadata in controller
* pod update and delete actions
* move ApplyDPAction to top
* namespace actions and rename some fields of UpdatePod
* adding code comments
* reconcile action
* fix bug in key-val ipsets
* implement all previous test cases
* fix incorrect error wrapping in dataplane.go
* multi-job tests are working. updated terminology from routine to job
* MultiErrManager instead of dependency for multierr
* return to the channel approach for multierr, now using FailNow instead of asserting on channel length
* fix some lints
* fix more lints
* [NPM][Fix] Adding redundant check to ignore endpoint not found while removing policies
* adding error in logline
* adding error in logline
* making sure applyDP error is handled correctly
* revert changes in controllers, covered in a different PR
* adding UTs
* Adding constant hardware address to the veth
* update the variable and error log message
* Removing the shadow variable
* Removing the shadow variable
* Adding the hardware ad address while endpoint creation
* Refactoring based on the comments
* Updated based on comments
* removing merge conflicts
* update the function parameter
* Addressing comments
* go lint errors
* updating the IPAddr variable
* updating the state to NUD PROBE for transparent endpoint
* Reverting the hardware address to use NUD_PERMANENT
Co-authored-by: root <root@DESKTOP-IS3TR2T.redmond.corp.microsoft.com>
* Adding constant hardware address to the veth
* update the variable and error log message
* Removing the shadow variable
* Removing the shadow variable
* Adding the hardware ad address while endpoint creation
* Refactoring based on the comments
* Updated based on comments
* removing merge conflicts
* update the function parameter
* Addressing comments
* go lint errors
* updating the IPAddr variable
* updating the state to NUD PROBE for transparent endpoint
Co-authored-by: root <root@DESKTOP-IS3TR2T.redmond.corp.microsoft.com>
* Disabled rp filter to enable packet tunneling
Tests ok (all basic functionality, 2 VMs, NS, delete, add)
* Added tests
* Typo
* Typo
* No need to disable rp filter in VM NS
* Native Endpoint Client Add Endpoints
* AddEndpointRules, ConfigureContainerInterfacesAndRoutes
* Changed interface names, log statements
nw.extIf.Name > eth0 (eth0)
eth0.vlanid > eth0.X (eth0.1)
%s%s hostIfName > vnet (A1veth0)
%s%s-2 contIfName > container (B1veth0)
* Renaming, using lib to set ns
* Namespace "path" is /var/run/netns/<NS>
* Loopback set up, Remove auto kernel subnet route
* Cannot set link to up if it's in another NS
* Multiple containers on same VNET NS
* Delete Endpoint routes on Delete
* Minimizing netns usage
* Moving NS Exec Code
* Further minimized netns.Set usage
* Moved helper methods down, drafted tests
* Removed DevName from Route Info, more tests
* Test existing vnet ns, delete endpoint
* NetNS interface for testing
* Separated tests by namespace
* Endpoints delete if they cannot be moved into NS
* Namespace netns tests
* Added Native Client to deleteEndpointImpl
* Deletion of Endpoints Impl and Tests
* Cleaned code (Tests ok)
* Moved mock/netns to package (Tests ok)
* Fixing Netns (wip)
Moved netnsinterface to consumer package (network).
Removed "Netns" from "NewNetns" and "NewMockNetns" as it is unambiguous.
Changed uintptr to int and casted the int to uintptr when needed later.
* Using errors.Wrap for error context (wip)
* Removed sentence case (wip)
* Removing variable predeclaration
* Removed NewNativeEndpointClient
Directly instantiating struct because nothing special happens in NewNativeEndpointClient
* Removed generics from ExecuteInNS
* Removed uintptr from mocknetns, tests compile
Forgot to remove uintptr from mocknetns
* Fix tests, lint
* Fixes from linter
Works on VMSS
* Replacing references to ethX with vlan veth
* Removed unnecessary log
* Removed unnecessary mac, fix tests
* Mockns method name enum
* Unable to use GetNetworkInterfaceByName due to NS
If I use GetNetworkInterface, I need to be in the vnet NS, but that means I will need to call ExecuteInNS, which causes tests to fail.
* Fixes from linter
* Assume if NS exists, vlan veth exists
Tests ok
* Fixes for Linter
* Snat refactor
* Fix delete tests
* Fix delete tests bug
* More snat refactor
* Breaking, prepping for Native Snat
Delete native endpoint snat route linux to remove errors and in theory, ovs should work fine again.
* Go mod tidy for linting
Hopefully this fixes the windows lint error
* Add fields to native endpoint client for snat
* Using New() func to create Native Client
Creation of the native endpoint client is too complicated to directly instantiate.
* Snat defaults
* Insert SNAT entry points
* Native Snat error handling
* Breaking, decouple ovsctl from snat
Proposed Solution implementation
Moved ovsctlClient.AddPortOnOVSBridge to ovs_endpoint_snatroute_linux.go. Removed ovsctlclient from NewSnatClient. Removed ovsctlClient from testing file.
* Delete unecessary ovssnat files
* No lint on vishvananda netns
Maybe this will fix the windows linter?
* Build linux only for netns package
Maybe this fixes the linter error?
* Remove nolint to see if linter fails
* Breaking, removed bridgeName
bridgeName refers to the OVS Switch I believe
* If native uses snat bridge, should also get IP
* Breaking, Decouple or Wrap snat route
* Check to see if snat triggered
* Snat behaviors specific to ovs/native
* Pass the pointer
Add/Delete ok
* Renaming to make consts public
* Breaking, moving ovs specific parts of snat to ovs
* Remove enable infra vnet (Tests ok)
Tested:
Allow Host to NC only
Allow NC to Host only
Allow both
Wget
Ping between containers
Warning: Enable snat is still hard coded to true!!!
* Move add port to after exists() check
* Moved netns interface to caller, generalized tests
Tests ok, Native ok
* Typos
* Reordered if statement, unwrapped arp
Tests ok, ping ok, wget ok
* Linted, wrapping errors
* Go fumpt entire network package
* Code markers removed, clean (Tests ok)
OVS & Native:
- Ping between two containers same VM, no packets on bridge
- Ping between two containers diff VM, no packets on bridge
- Ping other container not in vnet, no packets on bridge
- Ping snat to container, packets on bridge
- Ping container to snat, packets on bridge
- Tcpdump confirmed on azSnatBr
- Deletion of containers deletes appropriate interfaces
* Renamed veth, fixed logs
* Made deleteEndpoints logic clearer, renamed error
* Renamed eth0 to primaryHostIfName, vlanEth to vlanIf
* Deleted debug log
* Corrected merge (hardware addr) (Tests ok)
* Renamed vlan veth to hostExtIf_vlanID, Disabled RA
eth0.2 makes disable RA look for a folder eth0 and then another sub folder "2". ("eth0/2") However, it should look for a folder named "eth0.2" literally. To solve this, we change the naming scheme to use an underscore instead. (Tests ok)
* Renamed Native to TransparentVlan
Confirmed basic functionality on VM with correct mode
* Make file updated
* Create azure-windows-multitenancy-transparent-vlan.conflist
* Unified snat err format
* Rename to transparent-vlan
* Route table support added to local netlink
* Moved SNAT to end of function
* Defer deleting vlan interface on failure
* Native Endpoint Client Add Endpoints
* AddEndpointRules, ConfigureContainerInterfacesAndRoutes
* Changed interface names, log statements
nw.extIf.Name > eth0 (eth0)
eth0.vlanid > eth0.X (eth0.1)
%s%s hostIfName > vnet (A1veth0)
%s%s-2 contIfName > container (B1veth0)
* Renaming, using lib to set ns
* Namespace "path" is /var/run/netns/<NS>
* Loopback set up, Remove auto kernel subnet route
* Cannot set link to up if it's in another NS
* Multiple containers on same VNET NS
* Delete Endpoint routes on Delete
* Minimizing netns usage
* Moving NS Exec Code
* Further minimized netns.Set usage
* Moved helper methods down, drafted tests
* Removed DevName from Route Info, more tests
* Test existing vnet ns, delete endpoint
* NetNS interface for testing
* Separated tests by namespace
* Endpoints delete if they cannot be moved into NS
* Namespace netns tests
* Added Native Client to deleteEndpointImpl
* Deletion of Endpoints Impl and Tests
* Cleaned code (Tests ok)
* Moved mock/netns to package (Tests ok)
* Fixing Netns (wip)
Moved netnsinterface to consumer package (network).
Removed "Netns" from "NewNetns" and "NewMockNetns" as it is unambiguous.
Changed uintptr to int and casted the int to uintptr when needed later.
* Using errors.Wrap for error context (wip)
* Removed sentence case (wip)
* Removing variable predeclaration
* Removed NewNativeEndpointClient
Directly instantiating struct because nothing special happens in NewNativeEndpointClient
* Removed generics from ExecuteInNS
* Removed uintptr from mocknetns, tests compile
Forgot to remove uintptr from mocknetns
* Fix tests, lint
* Fixes from linter
Works on VMSS
* Replacing references to ethX with vlan veth
* Removed unnecessary log
* Removed unnecessary mac, fix tests
* Mockns method name enum
* Unable to use GetNetworkInterfaceByName due to NS
If I use GetNetworkInterface, I need to be in the vnet NS, but that means I will need to call ExecuteInNS, which causes tests to fail.
* Fixes from linter
* Assume if NS exists, vlan veth exists
Tests ok
* Fixes for Linter
* Fix delete tests
* Fix delete tests bug
* Go mod tidy for linting
Hopefully this fixes the windows lint error
* No lint on vishvananda netns
Maybe this will fix the windows linter?
* Build linux only for netns package
Maybe this fixes the linter error?
* Remove nolint to see if linter fails
* Moved netns interface to caller, generalized tests
Tests ok, Native ok
* Typos
* Reordered if statement, unwrapped arp
Tests ok, ping ok, wget ok
* Renamed veth, fixed logs
* Made deleteEndpoints logic clearer, renamed error
* Renamed eth0 to primaryHostIfName, vlanEth to vlanIf
* Remove azure-vne-telemetry for windows multitenancy and telemetry service for windows multitenancy will be started from cns.
* start telemetry service from cns
* lint and log fix
* minor change
* addressed comment
* initial implemenation with timeout
* initial implemenation with timeout hns
* modify test
* modify code slightly
* updating to read in timeout flag and settings
* updating to read in timeout settings
* remove extra space
* correct a typo
* timeout value greater than zero for detection
* add couple ut's and remove needless code
* including timeout in hnsv1
* wip
* address comments
* address comments
* supress linter errors and update conflist
* fix linter and ensure we don't regress our tests
* updating with p.r feedback
* addressing comments
* updating linter warning
* update to address TM's comments
* fix lint error
* correct a linter spacing complaint
* remove fmt.sprintf
* removed cni read config log
* removed duplicated and spam logs
* addressed comment
* commit
* reverting back to old permission
* revert files baxck to original state
* addressing hunter comments
* feat: don't use CNS for CNI DEL command in windows multitenancy
* go fmt
* go fmt take 2
* fix: don't fallback to CNS for getNetwork or deleteHostNCApipaEndpoint, handle errNetworkNotFound
* test: add test for FindNetworkIDFromNetNs
* fix: getNetworkName needs to fallback to CNS when not found in state file for ADD
* fix: simplify the deleteHostNCApipaEndpoint function
* fix: linter
* fix: cnm should compile
* fix: always return retriable error for endpoint deletion failure
* fix: handle npe in cns/hnsclient by not using that package
* fix: logging
* fix: don't try cns if there is no multitenancy client
* fix: don't call CNS twice during ADD cmd
* fix: use hns wrapper, add some logging, don't return error when endpoint is already deleted
* added dsr changes for windows
* fixed lint and added unit test
removed unused error
* skip adding dsr policy for hnsv1
* addressed comments
lint fix
* fixed windows uts