Граф коммитов

118 Коммитов

Автор SHA1 Сообщение Дата
Dave Poole c079e29fa1 revert accidental new variable creation 2024-05-14 22:53:17 +00:00
Dave Poole 486672beba preserving the original logic 2024-05-14 21:34:36 +00:00
Dave Poole 89379ba69c fix accidental edit 2024-05-14 21:12:19 +00:00
Dave Poole c84eb7be8d fix error check 2024-05-14 13:22:48 -07:00
Dave Poole c7b28f08d5 fix(cgroups): Fixing the check for systemd-run
Although the tests have been passing on the latest changes, there was a failure in testing last night.

When investigating I found the cause of the problem.  When you call cmd.Execute("systemd-run") golang will (sometimes) replace it with the full path (in this case /usr/bin/systemd-run) and so our check for systemd-run mode was not working and it was going down the old code path of direct cgroup assignment.

Fixing by being explicit about it and returning a boolean indicating whether resource governance is required after the process is launched.  This brings it back to the way it was in the previous PR iterations but avoids the objections raised there due to linux only concepts.  When we converge the windows code here, the implementation of applyResourceGovernance will use Job objects on windows and the code flow will be the same.
2024-05-14 12:37:10 -07:00
dpoole73 6d2ff1e5f7
Merge pull request #69 from Azure/dpoole/systemd-run-commandline-fix
Change the commandline used for systemd-run depeding on the installed version
2024-05-06 12:58:53 -07:00
Dave Poole 5b29a32e8f feedback 2024-05-06 10:44:02 -07:00
Dave Poole f9ff9c5cea Change the commandline used for systemd-run depeding on the installed version
We found when testing on some ditros that they had older versions of systemd installed.

Versions before 246 use `MemoryLimit` and after that use `MemoryMax` so we need to know which version we have when constructing the commandline.

Also older versions didn't support the `-E` flag for environment variables and instead use the longer form `--setenv`.  This same flag is supported in both old and new versions
2024-05-06 09:51:49 -07:00
Kevin Lugo 62799315b6
Removing Unnecessary Telemetry Events and Log CustomMetrics Changes only (#68)
* Removed Noise Telemetry Events, and more details on error log.

* - Created new CustomMetricsStatusType
- CustomMetrics will know be reported only when there is a Change in the CustomMetric Field.
- Added commitedCustomMetricsState variable to keep track of the last CustomMetric Value.
2024-05-03 16:24:36 -07:00
dpoole73 bd1dbc02e8
Merge pull request #67 from Azure/dev/dpoole/update-vmwatch-5-2
chore: update the latest vmwatch binaries (1.1.1)
2024-05-02 22:29:51 -07:00
Dave Poole 30a2d4c04e update the latest vmwatch binaries (1.1.1) 2024-05-02 14:11:20 -07:00
Kevin Lugo b56f2ad074
Adding Kusto Telemetry to ApplicationHealthLinux v2 (#63)
* Adding internal/manifest package from Cross-Platform AppHealth Feature Branch

* Running go mod tidy and go mod vendor

* - Add manifest.xml to Extension folder
- Chaged Github workflow go version to Go 1.18
- Small refactor in setup function for bats tests.

* Update Go version to 1.18 in Dockerfile

* Add logging package with NopLogger implementation

* Add telemetry package for logging events

* - Add telemetry event Logging to main.go

* - Add new String() methods to vmWatchSignalFilters and vmWatchSettings structs
- Add telemetry event Logging to handlersettings.go

* - Add telemetry event Logging to reportstatus.go

* Add telemetry event Logging to health.go

* Refactor install handler in main/cmds.go to use telemetry event logging

* Refactor uninstall handler in main/cmds.go to use telemetry event logging

* Refactor enable handler function in main/cmds.go to use telemetry event logging

* Refactor vmWatch.go to use telemetry event logging

* Fix requestPath in extension-settings.json and updated 2 integration tests,  one in 2_handler-commands.bats and another in 7_vmwatch.bats

* ran go mod tidy && go mod vendor

* Update ExtensionManifest version to 2.0.9 on UT

* Refactor telemetry event sender to use EventLevel constants in main/telemetry.go

* Refactor telemetry event sender to use EventTasks constants that match with existing Windows Telemetry

* Update logging messages in 7_vmwatch.bats

* Moved telemetry.go to its package in internal/telemetry

* Update Go version to 1.22 in Dockerfile, go.yml, go.mod, and go.sum

* Update ExtensionManifest version to 2.0.9 on UT

* Add NopLogger documentation to pkg/logging/logging.go

* Added Documentation to Telemetry Pkg

* -Added a Wrapper to HandlerEnviroment to add Additional functionality like the String() func
- Added String() func to handlersettings struct, publicSettings struct, vmWatchSettings struct and
vmWatchSignalFilters struct
- Added Telemetry Event for HandlerSettings, and for HandlerEnviroment

* - Updated HandlerEnviroment String to use MarshallIndent Function.
- Updated HandlerSettings struct String() func to use MarshallIndent
- Fixed Failing UTs due to nil pointer in Embedded Struct inside HandlerEnviroment.

* - Updated vmWatchSetting String Func to use MarshallIdent

* Update ExtensionManifest version to 2.0.10 on Failing UT

* removed duplicated UT

* Removed String() func from VMWatchSignalFilters, publicSettings and protectedSettings
2024-05-01 23:46:04 -07:00
Mingyang Zheng e8e69f4a0b
Merge pull request #66 from Azure/release-2.0.10
bump version to 2.0.10
2024-04-30 14:47:58 -07:00
dpoole73 b552134a5c
Merge pull request #65 from Azure/dev/dpoole/update-vmwatch-4-30
Updating vmwatch binaries to 1.1.0 package
2024-04-30 14:47:00 -07:00
Mingyang Zheng 65961892b2 bump version to 2.0.10 2024-04-30 14:08:53 -07:00
Dave Poole 152b39ce93 Updating vmwatch binaries to 1.1.0 package 2024-04-30 11:01:11 -07:00
dpoole73 043d2a2773
Merge pull request #64 from Azure/dev/dpoole/cgroup-using-systemd-run
fix(systemd-run): Switch to use systemd-run instead of direct process and cgroup manipulation
2024-04-26 14:37:50 -07:00
Dave Poole 23cb651c36 correcting the search term
i don't know why this passed before, clearly we kill the process when we fail to assign a cgroup i don't know why it would ever return a different message with this fix test pass locally
2024-04-24 21:44:35 +00:00
Dave Poole 3e94eb670f revert 2024-04-24 20:11:45 +00:00
Dave Poole fea2ebdb9f fix test issue. There seems to be a non-deterministic case where the message can get logged differently 2024-04-24 12:24:09 -07:00
Dave Poole b30ee9181a feedback 2024-04-24 11:37:57 -07:00
Dave Poole 0c9a693cc7 feedback 2024-04-24 11:04:11 -07:00
Dave Poole e4c00e98d7 feedback 2024-04-23 19:54:35 +00:00
Dave Poole c86499738b make sure bash files are lf line endings 2024-04-22 21:47:18 +00:00
Dave Poole fc430b00fc correct typo 2024-04-22 21:46:04 +00:00
Dave Poole 8dae5e7377 Switch to use systemd-run to launch vmwatch
Background:

Our tests have been running fine for a long time but suddenly started failing on specific os versions.  This was because the process (although initially associated with the correct cgroup that we created) gets moved back to the parent cgroup.  This results in the limits being removed.

I did some research and reached out to various people and found that this is something that has previously been seen.  When a process is started with systemd you are not supposed to manage cgroups directly, systemd owns its own hierarchy and can manipulate things within it.  Documentation says that you should not modify the cgroups within that slice hierarchy directly but instead you should use `systemd-run` to launch processes.

The GuestAgent folks saw very similar behavior and switching to systemd-run resolved all their issues.

Changes:

Changed the code to run using `systemd-run` to launch the vmwatch process.  Using the `--scope` parameter results in the call to wait until the vmwatch process completes.

The process id returned from the call is the actual process id of vmwatch.

I have confirmed that killing vmwatch and killing app health extension still has the same behavior (the PDeathSig integration is working fine) and the aurora tests are working fine with these changes.

NOTE: Because in docker containers, systemd-run is not available, the code falls back to run the process directly and continues to use the old code path in that case.  This should also cover and linux distros which don't use systemd where direct cgroup assignment should work fine.
2024-04-22 10:02:51 -07:00
Frank Pang 7a9c4ffbe1 Bump to v2.0.9 2024-03-19 12:23:37 -07:00
Kevin Lugo 3ae2bd3f80
Fixing bug to successfully kill AppHealth Gracefully & VMWatch. (#59)
Changes by @dpoole73 
- Fix bug where we were not using the global `vmWatchCommand` variable so the SIGTERM handler was not killing anything
- set the `Pdealthsig` property on the command so the SIGTERM signal is sent to the sub process on parent process termination

This fixes both issues:

Before the fix if we killed app health process, vmwatch process was always leaked

After the fix:
`kill <pid>` -> log message "Received shutdown request" and kill vmwatch.
`kill -9 <pid>`-> no log message, vmwatch is killed

Changes by @klugorosado 
- Added Integration tests to kill AppHealth Gracefully with SIGTERM and SIGINT, and validated VMWatch Shutdown. 
- Added Integration tests to kill AppHealth Forcibly with SIGKILL, and validated VMWatch Shutdown. 
- Added the capability for dev containers to run Integration tests inside.
2024-03-19 12:06:04 -07:00
dpoole73 1bc094cf51
Merge pull request #56 from Azure/dev/dpoole/update-vmwatch-to-1.0.13
update vmwatch to 1.0.13
2024-02-15 06:53:28 -08:00
Dave Poole 16ab06cd58 update vmwatch to 1.0.13 2024-02-12 15:09:20 -08:00
dpoole73 b2f10123ea
Merge pull request #55 from Azure/dev/dpoole/tweak-resource-governance
Dev/dpoole/tweak resource governance
2024-02-10 13:15:48 -08:00
Dave Poole d6fedcb57e feedback 2024-02-09 20:27:26 +00:00
Dave Poole a537451ab1 clean up logic a bit 2024-02-09 11:06:03 -08:00
Dave Poole d29ce6a4be add validation of default value 2024-02-08 16:08:17 -08:00
Dave Poole cba77e8a06 Configurable rg settings
tweaking settings based on findings in sql vms
2024-02-08 15:54:59 -08:00
dpoole73 57bca902fd
Merge pull request #54 from Azure/dev/dpoole/update-vmwatch-to-1.0.8
updating vmwatch version to 1.0.8
2024-01-26 09:17:50 -08:00
Dave Poole 9c6393af71 1.0.11 2024-01-25 17:39:58 -08:00
Dave Poole 8f3fd7520f updating to 1.0.10 version 2024-01-24 16:54:13 -08:00
Dave Poole 3791d080c5 move to version 1.0.9 2024-01-22 15:20:55 -08:00
Dave Poole f443d2e7ac updating vmwatch version to 1.0.8 2024-01-22 14:46:10 -08:00
Kevin Lugo 1f5e41b085
Passing Extension Version to VMWatch (#52)
* Added --apphealth-version flag to VMWatch with AppHealth version from manifest.xml

* - Validated Extension Version on existing VMWatch.
- Created bash function to extract Version from manifest.xml.
- GetExtensionManifestVersion now first attempts to get Extension Version from Version passed at build time and uses manifest.xml file as fallback.
2024-01-18 14:02:30 -08:00
Manish Satwani e27e5e5de8
Merge pull request #51 from manishsat/feature/v2/bootstrapVMWatch
adding a way to provide custom container to upload artifact
2024-01-12 16:28:11 -08:00
Manish Satwani 0f218be75c use variable instead of hard coded value 2024-01-13 00:04:38 +00:00
Manish Satwani 1e33779b0c removed extra space 2024-01-13 00:03:36 +00:00
Manish Satwani 23fde9d16e adding a way to provide custom container to upload artifact 2024-01-13 00:02:00 +00:00
Manish Satwani e7ab697c21
Merge pull request #50 from manishsat/feature/v2/bootstrapVMWatch
removing data-type key from json tag for url type, unmarshal is failing otherwise
2024-01-10 14:32:38 -08:00
Manish Satwani 87048b99a3 added asserts on enabled and globalConfigUrl 2024-01-10 22:09:02 +00:00
Manish Satwani 48c44fce7b Adding test case unmarshal public settings 2024-01-10 20:33:32 +00:00
Manish Satwani 3900799438 removing type from tag, unmarshal is failing otherwise 2024-01-10 19:06:10 +00:00
Manish Satwani 1d7ac0a4e7
Merge pull request #48 from manishsat/feature/v2/bootstrapVMWatch
adding more properties to pass down to vmwatch
2024-01-09 17:39:14 -08:00