Moving the CopyOnWrite functionality to a separate dll. This will also break the dependency to CopyOnWrite package to prevent a diamond dependency when Processes gets used externally
Change cache lookup stats logging to reflect whether the lookup happened using an augmented weak fp (rather than whether the lookup *was* performed for an augmented weak fp). The result is that when we now print
`... (augmented: value), Visited entries: x, Unique pathsets: y)`
'value' now reflects and arguably more intuitive meaning (whether the lookup being printed was performed with a regular wf or an augmented one)
Account for old `Microsoft.CodeAnalysis` library at runtime to fix builds such as https://cloudbuild.microsoft.com/build/ecdfb649-8b1d-10b1-e998-c9a9c2fe59b6?bq=Bvt_Skype_Services_Messenger_CodeSign, which emit the following types of errors:
```
error MSB4017: The build stopped unexpectedly because of an unexpected logger failure.
1>Microsoft.Build.Exceptions.InternalLoggerException: The build stopped unexpectedly because of an unexpected logger failure. ---> System.MissingMethodException: Method not found: 'System.Collections.Immutable.ImmutableArray`1<Microsoft.CodeAnalysis.CommandLineSourceFile> Microsoft.CodeAnalysis.CommandLineArguments.get_EmbeddedFiles()'.
```
This will change will enable more repos to utilize the shared compilation feature.
Also, enabled nullability and fixed style issues.
Related work items: #2036773
A multi-layer cache can result in the same (wf -> augmented wf) pair returned more than once. Today this means we will go through all the candidates associated with the augmented wf as many times as duplicate pairs come back. This PR shortcuts that search when it sees the same augmented wf coming again for the same lookup.
Sorry, no tests for this optimization. The only way to get duplicate entries is by having an at least 2-layer cache, which is pretty complicated with the test infra we have today.
Ensure %userprofile% for process remoting. This is to support an optimization to the Windows sandboxes where we pre-create c:\users\username\... (something like 17 dirs) to avoid I/O during critical path.
Plus some refactoring from BuildXL.Process refactoring.
Associated PR: !710754
Folding all of Collections into BXL.Utilities.Core. The recommended way to create Nuget packages is to do 1 dll per package unless the dlls are supposed to always be used together. Collections is a part of Utilities and it shares a large subset of dependents with Utilities.Core, so merging them together to reduce the amount of dlls.
Noticed that a few more files are used only by ProcessPipExecutor so moving those there instead. Additionally, pulling out all the VM and ExternalSandbox code into a separate dll as it's not used by Processes and doesn't fit in ProcessPipExecutor
Processes depends on a very small portion of BXL.Utilities.Configurations. And most of those dependencies are BXL specific as well so it makes sense to me to just break this dependency.
Office would like to experiment with a light version of BuildXL in their PR builds. First step is to add an option to disable XLG events from workers to the orchestrator. Manifest events will be still sent though.
Refactoring Instrumentation.Common to only contain Aria specific files, and moving all the non-Aria code into Utilities.Core instead. The motivation behind this is so that BXL released packages won't contain Aria packages which is an external pacakge. We want to have as little external packages as possible because it might cause diamond dependencies. (eg. BXL.Utilities depends on Aria1.0 and ExPkg depends on Aria1.1 and the customer pulls both in, there will be a version conflict)
There are two scenarios when a build will have work waiting on resources:
1. When the scheduler limits have been throttled due to available resources
2. When the scheduler must actively cancel some executing processes due to resource availability.
We see negative numbers in the pipsWaitingOnResources value in scenario 2. Here, the scheduler temporarily decreases TotalProcessSlots as a signal to cancel some of the actively running processes. In this case AquiredProcessSlots will actually be larger than the Total and the prior calculation was yielding a negative number. This negative number is only temporary while the cancellation is happening. Once cancelled, the AquiredProcessSlots will no longer exceed TotalProcessSlots. The pips that got cancelled will once again be back in the ChooseWorkerCpu PipExecutionStep. That's why the negative number is only ever temporary.
Calculating based on LocalWorker.TotalProcessSlots - LocalWorker.AquiredProcessSlots is clearly wrong while the cancellation is happening. It is also not correct during the steady state while a build is running, either throttled or unthrottled. When a machine is maxed out, the two will have the same value and the difference will be zero. Math.Min() will pick that 0 value even though there may still be many pips in the ChooseWorkerCPU step.
It seems directly using the count of pips in the ChooseWorkerCPU step is the appropriate thing to do.
Related work items: #2041760
Currently to check if statx is supported, we check for the major/minor version of the OS. However, it is not enough because we need to check whether the installed glibc supports statx as well.
We can call `ldd --version` and parse its output to know the glibc version. We can call `gnu_get_libc_version()` to get the glibc version. In this PR we simply invoke statx on the executing assembly and check for the `EntryPointNotFoundException`.
Related work items: #2043763
Updated Tool.QTestRunner.dsc to use qTestMsTestPlatformRootPathValue,l a path value, for --msBuildToolsRoot so that the input to the qtest, qTestMsTestPlatformRootPath, can be different than the qTestMsTestPlatformRootPathValue path. This happens when qTestMsTestPlatformRootPath is an opaque from a nuget pckage and qTestMsTestPlatformRootPathValue is a subdir of that.
This PR:
- Adds the concept of a `Topology`. The `Topology` is responsible for handing out a storage client for any given object in the system. Currently, we have only one `ShardedTopology` (a sharding scheme points to a specific storage account).
- Adds the concept of an `IShardingScheme` over a bunch of nodes (i.e., "how do I distribute objects uniformly across shards"). The `ShardedTopology` uses this interface to point to a specific storage account.
- Adds the concept of an `IStorageCacheSecretsProvider`. This interface is used by the `Topology` to obtain storage secrets for all the involved storage accounts / containers. The intention is that we'd be able to obtain container-specific SAS tokens to scope security.
- Rewrites `BlobFolderStorage` into `BlobStorageClientAdapter`. Instead of using the concept of paths and so on that we had, this class now takes an actual client and it basically just adds a bunch of methods that you can do against storage. It's not an extension class because there's several shared fields.
- The concept of using a single folder inside of Azure Storage has been split off into `AzureBlobStorageFolder` (which now takes care of basically handing out storage clients).
- Renames `AzureBlobStorageCredentials` into `AzureStorageCredentials` (because this will be used for queue in the future).
- Removes bulk pin from Blob L3, it doesn't work.
- Adds test coverage to memoization operations
- Adds bicep scripts for provisioning L3 instances
- Flattens the serialization format for content and metadata, they are now at the root of the container instead of inside folders. This is on purpose to allow for longer names, which is particularly important for Selectors inside StrongFingerprint
[WIP]
- Deploy to CBTest to ensure nothing here breaks our current way of working. This is only because of the BlobFolderSTorage change
Add retries when opening a file fails with UnauthorizedAccessException.
The PR also changes the exception type thrown due to sharing violation on linux. Before the code was failing with `IOException` on Linux and `UnauthoziedAccessException` on Windows. Now the behavior is consistent.
Related work items: #2039887
Don't publish recompute information globally. Also removes the concept of a LocationAddRecentInactiveEager. That feature has been useless for months because machines are filtered out at both local and global, so doing a proactive global registration doesn't help.
We realized that `UseInRingMachinesForCopies` feature is not helpful because it generates a lot of extra netrowking traffic and doesn't improve build reliability.
This PR removes the code that we should not use. If/when we'll think that we need something like that we'll implement such feature from scratch.
This removes the additional line on the console below the main status line that gets displayed with pips are resourced paused. ex:
`3 running pips waiting on resources`
The data is already shown in the primary status and we want to decrease the amount of scrolling back people have to do in the ADO console view when looking at builds that have already completed.