Don't publish recompute information globally. Also removes the concept of a LocationAddRecentInactiveEager. That feature has been useless for months because machines are filtered out at both local and global, so doing a proactive global registration doesn't help.
We realized that `UseInRingMachinesForCopies` feature is not helpful because it generates a lot of extra netrowking traffic and doesn't improve build reliability.
This PR removes the code that we should not use. If/when we'll think that we need something like that we'll implement such feature from scratch.
This removes the additional line on the console below the main status line that gets displayed with pips are resourced paused. ex:
`3 running pips waiting on resources`
The data is already shown in the primary status and we want to decrease the amount of scrolling back people have to do in the ADO console view when looking at builds that have already completed.
This PR moves majority of Pips dependencies from Processes. All of the Pips usages are BXL specific. Additionally, moved a bunch of files that are only used by SandboxedProcessPipExecutor or above into ProcessPipExecutor
After some reflection, we should be more aggressive about limiting the frequency of status updates that go to the ADO console. We are getting customer feedback that it is annoying to scroll through the many lines of status updates for longer builds. This change aims to strike a balance of providing some feedback in case folks want to see status while the build is running, but erroring on the side of having fewer status lines which aren't relevant once the build is over.
Usage of /RedirectedUserProfileJunctionRoot was thrown off because /enableProfileRedirect must also be set for it to take effect. This change simplifies config by just removing /enableProfileRedirect. It was not being set anyway and now that CloudBuild is externally setting the path, there is no longer need for a CB specific default path in the BuildXL codebase.
Only ensure PTrace files exist and executable when PTrace is enabled:
- AnyBuild doesn't use PTrace. Missing access reports are compensated by Fuse.
- In previous PR I suggested Pasindu to use BuildXL itself as the daemon. In that case we don't have to ensure the existence of `ptracedaemon`.
- The existence and executability of `ptracerunner` still needs to be guaranteed, and should only be done once.
This PR should allow AnyBuild to ingest new BuildXL without further modification in AnyBuild file distrib.
Move Counter From BXL.Utilities to Utilities.Core. CounterAttributes are needed by Processes but it made sense to just move all of Counter to Utilities.Core.
Processes.Remoting is currently only used by ProcessPipExecutor. It must be moved out of Processes due to its dependency on Pips. So moving it to its own dll as it's separate from both Processes and ProcessesPipExecutor
The previously developed status update throttling feature only applied to the "##vso[task.setprogress" which are interpreted by ADO. This change refactors that throttling to extend it to the user visible status lines on the console. The goal is to decrease the status updates that people scroll through on ADO.
The process might clear the environment so we need to propagate this on exec. Also, let's check this variable every time, even if the FAM flag is not set, to fail fast if this is unexpectedly absent: we set the variable even when the PTrace mode is not enabled, so this should always be there
Related work items: #2041253
Measure the time it takes to push outputs to the cache for each pip and record it in the perf info object used by the scheduler. Add this counter as a sub-step of execute process in the main log perf summary
The FIFO might still hold some unread reports when the process tree has finished, so it's incorrect to stop processing the reports as soon as we find out that all the processes for the pip have exited. Doing this causes reports to be dropped, messing with the observations for the pip.
What we do know at the point where we call `RequestStop` is that there will be no more reports pushed to the FIFO (because all the processes have exited, there are no more possible writes), so we can send a special value through it to signal the FIFO-consuming thread that it can finish. This is preferred against just disposing our keep-alive write handle to produce an EOF in the consumer (which was our original strategy), because the `Dispose` on that handle produces the EOF with a considerable delay.
Related work items: #2021040
Capture observed inputs for failed & retried pips in the XLG.
Lert ObservedInputProcessor return observed inputs for failed pips and dfa pips
Add two new kinds of FingerprintComputation to make runtime analyzer ignore failed pips' fingerprint information but do computation for dfa pips
When we are approaching a CB timeout, we trigger a cancellation token. As a result, we cancel all running pips as well as cancel the post processing of all finished pips (even if they were successful). This pr fixes the assert, so we don't crash during build termination.
Related work items: #2028490
The implementaion of rename (and renameat) in the Linux sandbox was generating accesses for the destination files without the write flag, and therefore they were interpreted as reads
- This change allows ptrace to run outside of the process tree of the pip, reducing any interference we have in the process execution. The previous iteration of the sandbox had issues with inheriting handles from the parent process that it was forked from causing pips to hang.
- Adds the ptracedaemon service that runs in the background to listen to ptrace requests from processes running under the interpose sandbox.
- When a request is received, the ptracerunner is executed in a separate process to attach to the tracee using PTRACE_ATTACH.
Related work items: #2037038
The breakdown of process overhead is just displaying the PipStep. But the ExecuteProcess PipStep can have quite a bit more work than just having the child process active. Grab a couple counters from constituent parts of this overhead and include them in the appropriate part of the breakdown instead of under "other"
AnyBuild started failing with the following error after updating to the latest bxl bits:
```
2023-03-11 05:48:57,219 [18] ERROR root: An exception is unhandled. Process will exit.
BuildXL.Utilities.Serialization.InsufficientLengthException: The buffer should have at least 2 length but has 0.
at BuildXL.Utilities.Serialization.InsufficientLengthException.Throw(Int32 minLength, Int32 remainingLength) in \.\Public\Src\Utilities\Utilities\Serialization\Insuffic> at BuildXL.Utilities.Serialization.SpanWriter.DoEnsureLength(Int32 minLength) in \.\Public\Src\Utilities\Utilities\Serialization\SpanWriter.cs:line 187
at BuildXL.Utilities.Serialization.SpanWriter.EnsureLength(Int32 minLength) in \.\Public\Src\Utilities\Utilities\Serialization\SpanWriter.cs:line 159
at BuildXL.Utilities.Serialization.SpanSerializationExtensions.Write[T](SpanWriter& writer, T value) in \.\Public\Src\Utilities\Utilities\Serialization\SpanSerializatio> at BuildXL.Cache.ContentStore.Distributed.MetadataService.RocksDbOperations.WriteMergeLocations(SpanWriter& mergeWriter, SpanReader& reader1, SpanReader& reader2, Boole> at BuildXL.Cache.ContentStore.Distributed.MetadataService.RocksDbOperations.WriteMergeLocations(SpanWriter& mergeWriter, ReadOnlySpan`1 value1, ReadOnlySpan`1 value2, B> at BuildXL.Cache.ContentStore.Distributed.MetadataService.RocksDbOperations.MergeLocations(ReadOnlySpan`1 key, ReadOnlySpan`1 value1, ReadOnlySpan`1 value2, MergeResult> at BuildXL.Cache.ContentStore.Distributed.MetadataService.RocksDbOperations.ProcessSingleLocationEntry(ReadOnlySpan`1 key, ReadOnlySpan`1 value, MergeResult result) in > at RocksDbSharp.MergeOperators.<>c__DisplayClass4_0.<CreateAssociative>g__mergeCore|1(ReadOnlySpan`1 key, MergeOperandsEnumerator operands, MergeValueBuffers buffers, R> at RocksDbSharp.MergeOperators.<>c__DisplayClass4_0.<CreateAssociative>b__3(ReadOnlySpan`1 key, NullableReadOnlySpan`1 value, MergeOperandsEnumerator operands, MergeVal> at RocksDbSharp.MergeOperators.MergeOperatorImpl.RocksDbSharp.MergeOperator.FullMerge(IntPtr key, UIntPtr keyLength, IntPtr existingValue, UIntPtr existingValueLength, > at RocksDbSharp.ColumnFamilyOptions.MergeOperator_FullMerge(IntPtr state, IntPtr key, UIntPtr keyLength, IntPtr existingValue, UIntPtr existingValueLength, IntPtr opera>
```
To mitigate this issue this PR makes the following changes:
* Increases the target buffer for merged location entry
* Adds an error handling to trace the error and not crash the app.
The root cause of this crash was a pip with 8 million dependencies. That was unintended in spec generation and has been addressed, but I figured it was still good to have some form of handling in case that pops up again.
Originally I wanted to handle this in a more elegant way where we would keep fingerprint text that was truncated after a certain point. But that would not be valid json. Truncating cleanly while still allowing the JSON to be valid is more effort than it's worth so just going with logging a warning and moving on.
Related work items: #2036600