I have been unable to run the integration tests locally since upgrading my laptop. I worked with kevin to figure out the issues and the tests are working now.
1. changing to build the test container using no-cache mode since if you have an old bad version it would not get rebuilt.
1. changing the devconatiner config to force running amd64 rather than arm64
1. tweaking the scripts to handle the slightly different process names and ps output when running in this way.
now, the tests pass on mac
* Adding internal/manifest package from Cross-Platform AppHealth Feature Branch
* Running go mod tidy and go mod vendor
* - Add manifest.xml to Extension folder
- Chaged Github workflow go version to Go 1.18
- Small refactor in setup function for bats tests.
* Update Go version to 1.18 in Dockerfile
* Add logging package with NopLogger implementation
* Add telemetry package for logging events
* - Add telemetry event Logging to main.go
* - Add new String() methods to vmWatchSignalFilters and vmWatchSettings structs
- Add telemetry event Logging to handlersettings.go
* - Add telemetry event Logging to reportstatus.go
* Add telemetry event Logging to health.go
* Refactor install handler in main/cmds.go to use telemetry event logging
* Refactor uninstall handler in main/cmds.go to use telemetry event logging
* Refactor enable handler function in main/cmds.go to use telemetry event logging
* Refactor vmWatch.go to use telemetry event logging
* Fix requestPath in extension-settings.json and updated 2 integration tests, one in 2_handler-commands.bats and another in 7_vmwatch.bats
* ran go mod tidy && go mod vendor
* Update ExtensionManifest version to 2.0.9 on UT
* Refactor telemetry event sender to use EventLevel constants in main/telemetry.go
* Refactor telemetry event sender to use EventTasks constants that match with existing Windows Telemetry
* Update logging messages in 7_vmwatch.bats
* Moved telemetry.go to its package in internal/telemetry
* Update Go version to 1.22 in Dockerfile, go.yml, go.mod, and go.sum
* Update ExtensionManifest version to 2.0.9 on UT
* Add NopLogger documentation to pkg/logging/logging.go
* Added Documentation to Telemetry Pkg
* -Added a Wrapper to HandlerEnviroment to add Additional functionality like the String() func
- Added String() func to handlersettings struct, publicSettings struct, vmWatchSettings struct and
vmWatchSignalFilters struct
- Added Telemetry Event for HandlerSettings, and for HandlerEnviroment
* - Updated HandlerEnviroment String to use MarshallIndent Function.
- Updated HandlerSettings struct String() func to use MarshallIndent
- Fixed Failing UTs due to nil pointer in Embedded Struct inside HandlerEnviroment.
* - Updated vmWatchSetting String Func to use MarshallIdent
* Update ExtensionManifest version to 2.0.10 on Failing UT
* removed duplicated UT
* Removed String() func from VMWatchSignalFilters, publicSettings and protectedSettings
Changes by @dpoole73
- Fix bug where we were not using the global `vmWatchCommand` variable so the SIGTERM handler was not killing anything
- set the `Pdealthsig` property on the command so the SIGTERM signal is sent to the sub process on parent process termination
This fixes both issues:
Before the fix if we killed app health process, vmwatch process was always leaked
After the fix:
`kill <pid>` -> log message "Received shutdown request" and kill vmwatch.
`kill -9 <pid>`-> no log message, vmwatch is killed
Changes by @klugorosado
- Added Integration tests to kill AppHealth Gracefully with SIGTERM and SIGINT, and validated VMWatch Shutdown.
- Added Integration tests to kill AppHealth Forcibly with SIGKILL, and validated VMWatch Shutdown.
- Added the capability for dev containers to run Integration tests inside.
* Initial checkpoint
* tweak tests
* tweak the scripts
1. use nc for a tco server instead of web server for simplicity
2. add the variables to control tolerating the failure assignment to cgroup to allow tests to run
3. add new test for the case where it fails
* feedback
* feedback
* feeback
* feedback
* Bootstrapping has no integration test regressions
* Add cleanup of VMWatch process during shutdown signals and upon other commands, plus integration test template
* Added integration tests for VMWatch
* Linting
* Fix file vet issues
* attempt to fix handler command: install - creates the data dir
* nit integration tests
* Use handlerenvironment to dictate vmwatch signal folder and verbose log file paths
* Include missing changes in previous commit
* Remove unnecessary changes
* Try to fix docker installation error in go workflow
* Fix integration tests
* Update HandlerManifest with process names for guest agent to monitor cpu/memory usage
* Run linting
* Remove cpu/memory limits in HandlerManifest + update VMWatch binary directory to bin/VMWatch/ + implement VMWatch process retries + update integration tests
* Update test.Dockerfile
* Rename workflow
* Add formatting & linting
* Add logic to do retries on failed tests + don't fail fast
* Minor nits
* Update integration tests + code changes to resolve comments regarding execution of process
* Formatting + Linting + Vet
* Add logic for recover and defer for executing VMWatch. Proper close and read of channel. Also only every 60 seconds
* fix integration tests
* Bump to v2.0.7
* revert unnecessary changes to schema.go
* Small fix to killVMWatch
* Fix logic for killing VMWatch
* v2.0.8 Added Support for dynamic EventsFolder directory from extension Handler Environment (#39)
* - moved handlerenv.go and seqno.go from "github.com/Azure/azure-docker-extension/pkg/vmextension"
- Added EventsFolder with other missing parameters.
* -removed vmextension lib dependency from VMwatch and other Files.
- Updates HandlerEnviroment.json test file.
- Updated VMwatch Integration Tests.
* - Bump to v2.0.8
* initial devcontainer changes
changes:
1. add devcontainer condig
2. add vscode build config
3. add makefile target to set up the appropriate stuff in the container
4. update some line endings and add gitattributes so script run
5. fix what seems to be a bug in fake-waagent script as it doesn't work without this fix for me
* update binaries and config to latest
* Resource governance, heartbeat and dev container changes
The main feature change here is the addition of resource governance for linux via cgroups.
We discover the current cgroup and add a sub cgroup for our purposes (limiting cpu to 1% and memory to 40MB)
I also added support for detecting a stuck vmwatch using the heartbeat file and implemented the same logic for restarts from the windows version (3 restarts per 3 hours)
As part of the development of this, I added support for devcontainer execution so we can step through the code from a dev machine into either a WSL session or a linux vm with tools installed.
I added integration tests to check process exit, OOM and cpu throttling. These changes required a few changes to the makefile and scripts.
I also updated the vmwatch binaries and added a script to download the latest ones as well
I updated the govendor files using the tool it told me to run I hope I did this right
* feedback
* feedback
* Run 'go mod edit -go=1.18 to be conistent with linux extensions repo
* Run linting/formatting
* Fix merge nits to merge conflicts
* Fix app health handler.log directory path
* Change to applicationhealth-extension
* Mistakenly added two VMWatch substatus items
* Adding filtering for tests which can only run on a real linux host (not WSL or docker)
continuing investigation...
* fix time from minutes to hours plus add makefile target to create zip file (for use in testing)
* feedback
* feedback
* add readme
* updated vmwatch version, config schema and commandline
* typo
* test fixes
* test fixes
* add helper script to upload binaries to storage
* change container name
* feedback
* feedback
* typo
---------
Co-authored-by: Frank Pang <frankpang@microsoft.com>
Co-authored-by: frank-pang-msft <92764154+frank-pang-msft@users.noreply.github.com>
Co-authored-by: klugorosado <142627157+klugorosado@users.noreply.github.com>