6.0 KiB
0028 Blame collector hang detection
Summary
Blame data collector now supports a new mode meant to help detect and fix hangs in test code. This mode does not introduce any perf hit on the test run as the proc dump process is only started after the specified timeout interval has elapsed.
Motivation
Whenever running tests in CI systems if a hang occurred it would generally lead to a timeout and abrupt cancelling of the CI pipeline without giving an oppurturnity for the test platform or other scripts to attach the necessary logs/dumps required to analyse the hang.
Pre-Requisites
This was introduced in testplatform version 16.4.0-preview-20191007-01 (This or a higher version is required)
Working
If the testhost does not send any messages to the datacollector for the specified duration then it is inferred as a hang and a dump is collected and the testhost process is killed to ensure any available attachments and logs are gracefully attached to the trx file and propagated up to the chain in case of a CI system for further analysis of the hang.
Steps to configure
Set the PROCDUMP_PATH
environment variable to the full path to the directory containing
the procdump.exe
and procdump64.exe
tools.
This path may be %ProgramData%\chocolatey\bin\
if you use choco install procdump
to acquire these tools.
Add the required settings as shown below in the .runsettings file sample below:
<RunSettings>
<DataCollectionRunSettings>
<DataCollectors>
<DataCollector friendlyName="blame" enabled="True">
<Configuration>
<CollectDumpOnTestSessionHang TestTimeout="300000" DumpType="mini" />
<ResultsDirectory>%AGENT_TEMPDIRECTORY%</ResultsDirectory>
</Configuration>
</DataCollector>
</DataCollectors>
</DataCollectionRunSettings>
</RunSettings>
Note: This works along with (but can also be used independent of) the existing collect dump option introduced in RFC 0024.
Supported options
Option | Description |
---|---|
DumpType |
If you choose to collect a full process dump. It takes values mini /full . By default, a mini dump will be created. |
TestTimeout |
Duration of inactivity in milliseconds (no test events from the test host) after which the data collector assumes a hang has occurred and proceeds to collect a dump and kill the test host process. |
ResultsDirectory |
The path must exist, but files will not be permanently stored there. Test results ultimately go to a TestResults folder under the test project directory by default, or where the --results-directory parameter specifies. |
Invoking the tests
Add the --settings path-to\test.runsettings
parameter to dotnet test
or vstest.console.exe
.
Dump collection with Blame data collector is only supported on Windows for now. For non-Windows agents, omit the .runsettings file or use one that does not activate the Blame collector.
Use the --results-directory
to control where the Sequence_<guid>.xml
and .dmp
files are placed.
Outputs
The Sequence_guid.xml
and .dmp files will be placed under the test project source directory in a TestResults\<guid>\
folder.
Note that the test runner may report a failure to produce the dump file, such as the one shown below:
Data collector 'Blame' message: System.IO.FileNotFoundException: Collect dump was enabled but no dump file was generated.
at Microsoft.TestPlatform.Extensions.BlameDataCollector.ProcessDumpUtility.GetDumpFile()
at Microsoft.TestPlatform.Extensions.BlameDataCollector.BlameCollector.SessionEndedHandler(Object sender, SessionEndEventArgs args).
But when it is followed by an attachments list that includes the dmp file such as the one shown below, the error above is incorrect and can be disregarded:
Attachments:
D:\a\_temp\7d047b47-621b-493e-9a11-ccff70000ce8\testhost.x86_7100_6e0907f276fd4b2ba0d80f1fb5332e89_hangdump.dmp
D:\a\_temp\7d047b47-621b-493e-9a11-ccff70000ce8\Sequence_6e0907f276fd4b2ba0d80f1fb5332e89.xml
Azure Pipeline considerations
The test runner can be invoked from your pipeline directly by invoking the command line runner. When using one of the Microsoft-owned tasks however, special considerations apply, as described below.
VsTest
task usage
As of now this can be used in the vstest task by making the above changes to runsettings and also enabling "advanced diagnostics" with collect dump set to "never" in the task UI if you just want hang dumps. (If both crash and hang detection needs to be enabled then set collect dump to "on abort only").
DotNetCoreCLI
task usage
The DotNetCoreCLI
task will try to upload these files as attachments, but any dump file exceeding 75MB will fail (and full dumps are almost surely going to exceed this limit).
To ensure that dmp files are collected so you can analyze them in the event of a hang/crash, you will need to author the script to capture these TestResults
directories as artifacts.
This will allow you to collect dumps that exceed 75MB.
The DotNetCoreCli
task adds a --results-directory $(Agent.TempDirectory)
switch to the dotnet test
command, so your artifact uploading script will need to search that directory for your test outputs including .dmp files.
It actually drops two copies of the outputs under this directory, so upload just one copy by selecting the files directly under a $(Agent.TempDirectory)\guid
pattern. Using Powershell this can be done like this:
$guidRegex = '^[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}$'
$filesToCapture = (Get-ChildItem $env:AGENT_TEMPDIRECTORY -Directory |? { $_.Name -match $guidRegex } |% { Get-ChildItem "$($_.FullName)\testhost*.dmp","$($_.FullName)\Sequence_*.xml" -Recurse })
The dotnet test
CLI will fail if two --results-directory
switches are specified, so it is not possible to override the results directory specified by the DotNetCoreCli
task.
Sample
Review a sample of a pull request to a repo that activates crash and hang dump collection using the DotNetCoreCLI
task.