- This change will update bxl to use [MSAL.NET](https://www.nuget.org/packages/Microsoft.Identity.Client) for authentication.
- Update to a single implementation of the authentication code for cache, drop and symbol.
- Changes AAD prompt behaviour to prompt in a new browser window when silent authentication fails. This should eliminate the need for a PAT/credential provider on local builds.
Related work items: #1908175
When choosing a worker for a module, we need to fill the 2x capacity of the preferred worker before the next workers. Running a module in another worker can be expensive. That's why, we try to fill the double capacity for the first preferred worker. We did not experiment with the load factor for module affinity, so I want to do some abtesting to find a better load factor. We should find a sweet spot between end-to-end duration and the size of the materialized inputs.
The original implementation for out-of-proc casaas was based on environment variables that are visible to every user.
This PR uses memory-mapped files for communicating secrets with an ability to update them on the fly.
Related work items: #1908518
We make an effort to associate errors with the pips that generated them, but some events are missing a pip description / pip id which is needed for this association
Related work items: #1807760
This PR adds support for changing the underlying storage of ClusterState from Redis to Azure Blob Storage. It does so by abstracting away the logic that updates ClusterState from LLS to a ClusterStateManager. The ClusterStateManager has a IClusterStateStorage which can be either Redis, Azure Storage, or a Transitional layer used just for the transition.
When in Transitional mode, machines are force registered in Azure Storage with the same machine mappings as set by Redis, and all operations are also executed in storage -albeit results are ignored-. This paves the way for moving to the storage-only mode, in which we only read from storage.
Related changes included in this PR:
- Removing of marking machines as active in MachineReputation and Event processing. The reason here is that doing so may cause incorrect decisions to be taken. For example, an completed copy cause machine A to report a good reputation on a machine B which has declared itself as inactive; this would lead machine A to believe B is up even when it isn't, which could cause a build failure if A made a decision to pin based on B's existence.
- Stop metadata service clients from reading the master machine location from ClusterState. Instead, they now read directly from the leader election mechanism. This both helps prevent staleness and simplifies the code.
- If Azure Storage Emulator is running locally, you're able to (manually) run all tests against it and they pass
Related work items: #1906346
Calculating setup cost is expensive and it is not worth it. It was explicitly disabled for a lot of queues, now it is a good time to disable it by default in BuildXL.
We also enable fireForgetMaterializeOutput by default as it was explicitly enabled for all Office builds for a long time.
Our selfhost can get stuck with this error:
```
npm ERR! code ENOENT
npm ERR! syscall open
npm ERR! path E:\BuildXLInternalRepos\BuildXLInt1\Out\Objects\y\z\vq1azq50jol9tla83pntbqz0\node-build-src/package.json
npm ERR! errno -4058
npm ERR! enoent ENOENT: no such file or directory, open 'E:\BuildXLInternalRepos\BuildXLInt1\Out\Objects\y\z\vq1azq50jol9tla83pntbqz0\node-build-src\package.json'
npm ERR! enoent This is related to npm not being able to find a file.
npm ERR! enoent
npm ERR! A complete log of this run can be found in:
npm ERR! E:\BuildXLInternalRepos\BuildXLInt1\Out\Objects\y\z\vq1azq50jol9tla83pntbqz0\npm-install-cache\_logs\2022-01-20T02_12_54_066Z-debug.log
```
It turns out that when `/MT` is used Robocopy miss copying some files, which in this case is `package.json`.
We do not really need a lock when adding events to the blocking collection for asynchronous logging. We can just change m_isAsyncLoggingEnabled to volatile, so we can guarantee that the updates to this field will be visible to all threads.
Related work items: #1859016
In some builds, the pipe reading got cancelled in the middle of the builds. It is still unknown what caused the cancellation because none of BuildXL code (except the test code) performs cancellation on pipe reading. This PR tries to mitigate that issue by retrying the pipe reading when some unknown cancellation is requested. Since the cancellation is not requested by BuildXL itself, retrying pipe reading can increase reliability.
For now, this is an opt-in feature. QuickBuild may use this feature to handle issues in some of their customer build queues.
- While a pip step is being processed in a worker, maintain a "pending events" queue associated to that pip (indexed by its semistable hash). Any raised events that are ready to be sent are added to that queue, which is disabled and drained when we're ready to report that pip's result.
- If an event can't be associated to an "active" message queue, it is sent immediately. This deals with the case where events are late and we already have sent the pip result.
Allow restoring checkpoints in the background for global cache. This is to alleviate some of the time required to download all the files when switching masters.
`Cmd.flag` returns `undefined` if the passed value is `false`, so essentially, customers have no way to disable build manifest (it's on by default now).
- Adds cancellation tokens for IArtifactContentCache.TryLoadAvailableContentAsync and ICacheReadOnlySession.PinToCasAsync.
Related work items: #1908176
**_No changes in the logging behavior_**
CloudBuild keyword is used by absolute events.
Performance and DependencyAnalysis keywords are only for categorization purposes. There is no special handling for those keywords unlike Progress, Diagnostics, SelectivelyEnabled, etc.
Related work items: #1859016
Those diagnostics events were not supposed to be logged, but we have been logging them. After this change, we will not be printing them anymore:
```
DistributionWorkerPipOutputContent
DistributionOrchestratorWorkerProcessOutputContent
ScheduleCopyingPipOutputToLocalStorage
ScheduleCopyingPipInputToLocalStorage
```
The other events I changed were already not logged because they were guarded by `ETWLogger.Log.IsEnabled`
Related work items: #1859016
Integrate BuildXL with AnyBuild SDK.
The design is to isolate AnyBuild use so that the `#if FEATURE_ANYBUILD_PROCESS_REMOTING` doesn't need to appear everywhere.
This PR depends on AnyBuild.SDK package that should be published after this PR !644510 is pushed.
Related work items: #1907310
There is an implicit requirement coming from many JS coordinators (yarn/rush) to have node.exe in the PATH of the running process. PATH can be controlled by the user, so this is not necessarily the case. But when node.exe location is explicitly specified, make sure we expose it to PATH of the running graph construction tool.
- Pass the cancellation token to cache for the TryMaterializeAsync operation.
- This will help improve user experience for users who cancel their build in the middle of materializing a single file from the L3 cache, so that they can avoid waiting for the file to finish downloading before their build finishes.
Related work items: #1908176
BuildXL Release pipeline currently contains two CloudTest tasks.
Task Cf runs on CloudTest Int, while task Cg runs on CloudTest Prod.
Due to limitations in CloudTest UI, this means the results of the Cg task are not populated in the `1ES Test` tab of Azure DevOps.
To fix this, there were two approaches: either move Cf to Prod or move Cg to Int.
Moving Cf to Prod is more complicated, since there are build pipeline certificates installed and used specifically by CloudTest Int environment, which are not available on Prod.
We are taking the approach of moving Cg task to CloudTest Int. To do this, the SKU and Image used to run the Cg task needed to change, as the currently configured ones are not available on Int environment.
Here's a successful run that uses this configuration on CloudTest Int environment: https://cloudtest.visualstudio.com/CloudTest/_build/results?buildId=57793&view=results
This is part 1 of two changes required for this to work. After this PR completes, the release pipeline task and the CloudTest Validation(Gvfs) pipelines need to be edited to use the correct CloudTest configuration, which involves the following changes.
- Use Int environment instead of Prod
- Use wus2-default stamp
- Use cloudtest tenant instead of BuildXL
This PR fixes a bug recently introduced with https://dev.azure.com/mseng/Domino/_git/BuildXL.Internal/pullrequest/644178 since the caching mechanism for real enumerations does not easily allow pip-specific paths. For that:
* only globally untracked paths are kept (which are always safe to cache)
* Directory translations are taken into consideration (which was another issue with the previous PR)
* Statically declared untracked files and scopes are left out. We need a different approach to account for this without hindering performance, and even though it would be more consistent to include them as well, they are not actually needed for the short terms need of dev cache.