Update nuget references so the whole closure is within the specified version ranges. After we get the new LKG we can turn off `doNotEnforceDependencyVersions: true`
Add a tool that validates the configuration to be deployed to AdoBuildConfig. This will be used to gate changes in that repository so that we always deploy configurations compatible with the installer.
Usage: `1ES.BuildToolsInstaller.Exe configcop --path <path-to-config-to-validate>`
Generally, output streams are handled in chunks to prevent them from reading everything into memory at once. But there was a check added back in commit 0167f82124 which violated this for sake of doing some checks for whether the output was truncated. This change reworks that handling to prevent it from calling ReadValueAsync() which reads in the entire stream.
I also changed ReadValueAsync to truncate rather than OOM as it is called in a number of places where we probably don't want to bother having elaborate chunking behavior.
Related work items: #2229870
We'd like to run some worker count experiments for Windows builds. We don't currently have a way to have CloudBuild do this, so a workaround is to utilize the engine's abtest flag with a mode that immediately releases a configurable number of workers.
This is wasteful since it will waste some machine time preparing a worker that will eventually get lost, but it is much easier to implement at this level. Moreover, this translates to ADO based distributed builds.
Nuget packages can specify group dependencies that are conditioned by a given target framework. E.g.:
```xml
<dependencies>
<group targetFramework="net6.0">
<dependency id="jQuery" version="1.10.2" />
<dependency id="WebActivatorEx" version="2.2.0" />
</group>
</dependencies>
```
The semantic of the condition is 'any framework compatible with net6.0 and the target framework' and not just net6.0. However, our nuget spec generation logic was adding this dependency **only** for net6.0. This was causing some dependent Nuget packages to be missing at deployment time. So far, we've been compensating for that manually adding the missing packages as explicit imports. But this is a risky practice since some packages might be required just at runtime and this might become evident only under particular code paths.
Modify the dependency generated logic to include all compatible frameworks wrt the specified framework dependency. This is a relatively safe change because at most we will be deploying *more* libraries than before. It should also allow us to clean up specs considerably after this reaches the LKG.
Expose per-pip timeout for JavaScript resolvers
Use pipTimeoutForProject to set the timeout and warning timeout for specific projects
Add Integration tests
Related work items: #2201737
Add an 'orchestrator' and 'worker' mode to the BuildXLNugetInstaller so that the version is decided only on a single agent in a distributed build: this avoids race conditions between the agents when resolving a version.
In the future this strategy might be adopted to the configuration download itself, but for that a prerequisite is that the configuration is distributed in a versioned manner (like Nuget), so for now it is good enough to keep this logic as specific to the BuildXL installer.
This PR also adds an integration pipeline to test the installer before releasing. This pipeline exercises a straightforward flow and also the worker-orchestrator consistency when using the distributed mode.
Changes to APH version, passed all validations.
Also do not see any issues with the package in Ubuntu 24.04.
The globalization issue seen earlier with the older version of APH has also been resolved.
Related work items: #2227478
Produce an error file for Yarn and Lage resolvers (so they are at par with the Rush resolver).
Make this change in behavior under an optional argument, so we don't break Office consumption
When bxl is running under Linux without an X server available, the regular interactive browser credential is not an option for doing cache auth. Use device code authentication on that case.
Align the worker id with what is shown in the ADO pipelines UI (worker #1 at the top, log names, etc.). Do this by using the predefined System.JobPositionInPhase variable to select a particular worker id
Related work items: #2226709
BuildXL is not able to dump a hung process due to "Process with an Id of N is not running" but the `GetExitCodeProcess` call just before didn't return success and it just created a handle so I'm pretty sure this case wasn't a race condition. Using this PR's changes I confirmed that the process was in fact running and that `Process.GetProcessById` was too earnest in its diagnosis that the process was complete. I was able to dump it without this method and so avoiding this call may be the best way to capture these kinds of process dumps.
Related work items: #2221778
This pull request addresses two bugs encountered when the Content Addressable Storage (CAS) is mounted separately from the outputs on Linux:
1. Temporary File Deletion: If TryInKernelFileCopyAsync fails, the temporary file is now deleted. Previously, this temp file would prevent CopyWithStreamAsync from copying the file.
2. Hardlink Creation: When creating a hardlink for content already in the cache, the process attempts to delete the original file and replace it with a new one that includes the hardlink. If this operation fails, the original file was being deleted. And the fallback to copying would return true because the content is already in the store, but the pip would fail in future operations due to the missing file. Now, if the file is deleted, the original file is re-created using the existing content from the cache.
Related work items: #2216529, #2219745
Added uncacheableExitCodes property to Process.cs
This prevents caching of a successful pip.
Added a corresponding integration unit test.
Related work items: #2210032
In various environments we hit some amount of pip cancellation and retrying due to memory throttling. This happening at a low level is not necessarily problematic and shouldn't be in the end user's face. Demote those messages to verbose and log if the total count passes a threshold.
We suppress end-user-facing logs in the workers even if they are selected for console redirection. Note that these events are always forwarded to the orchestrator so they *will* end up in the orchestrator console anyway
Currently, content hash lists are always replaced in Blob L3 implementation because AutomaticallyOverwriteContentHashLists is set to false. AutomaticallyOverwriteContentHashLists is poorly named but setting it false indicates that the content session should not be used to pin content to check availability to prevent replacing content hash lists with available content. The end result is that content hash lists are always replaced. The negatively impacts graph caching and pip caching, because concurrent runs generating same fingerprint will overwrite each other rather than converging. Graph caching is even more impacted because it relies entries not being replaced for its fingerprint chaining. This means that similar builds which generate the same initial graph fingerprint but ultimately have different graph fingerprints continuously stomp over each other.