Граф коммитов

6887 Коммитов

Автор SHA1 Сообщение Дата
Serge Mera 2a6c56a2c8 Merged PR 763864: DScript spec cleanup
Mostly cache specs, but also touching some shared stuff:
* Unified some Nuget packages (System.Collections.Immutable, System.Memory, etc), and bumped versions to the latest. This triggered a bunch of other updates...
* Removed a lot of unnecessary 'withQualifier'
* Simplified some references by fixing the 'runtimes' folder behavior on the Nuget resolver (which forced a bunch of 'withWinRuntime' calls

More cleanup passes are possible, but those to come later
2024-02-01 22:30:56 +00:00
Julian Bayardo 69a854a55e Merged PR 764740: Avoid unnecessary rehashing in ephemeral layer
We need to always hash files when we're uploading to ensure the `ContentHash` matches that we're supposed to be uploading. However, in most code paths we're already hashing the files when inserting into the local, so we've already done a check to ensure the file hash matches. In such cases, we can avoid hashing the file twice.
2024-02-01 21:54:39 +00:00
Pasindu Gunasekara 🍣 b542074324 Merged PR 765015: Updated Release-Notes.md
Updated Release-Notes.md
2024-02-01 21:31:37 +00:00
Julian Bayardo d237a120c9 Merged PR 764819: Increase maximum connections per server 2024-02-01 01:09:25 +00:00
CloudDev Build Account acf0328f4e New LKG 0.1.0-20240130.5
[skip ci]
2024-01-31 23:25:48 +00:00
Julian Bayardo 30e512ea13 Merged PR 764783: Extend elision based on Persistent locations to Ephemeral pin 2024-01-31 23:19:49 +00:00
Julian Bayardo c3439ff700 Merged PR 764728: Address triple concurrent request failure mode
QB inserts the same file hash from 3 different paths at the exact same time. Because we don't limit concurrency per hash at the ephemeral layer, the 3 requests start at the same time.

The 1st request starts up and inserts into the local FSCS, then spends 2s uploading to Azure Storage.

The other 2 requests start up, they both fall into the optimization this PR deletes. At this point, we have: 1 request uploading, 2 other requests completed with Success to insert due to the heuristic.

Because QB deduplicates the file uploads based on hash at some layer, it decides that the target has been uploaded, and proceeds to schedule a target in a different worker.

That different worker doesn't need to check for cache hit because the targets it depends on were built on the same build, so it just proceeds to Place the content. We try to copy the file from the other workers in the build, but they all timeout (we've got extremely low timeouts here on purpose), so we fall onto storage. Because this races with the file upload, and the file upload isn't done, storage says "this doesn't exist".

We fail the build!

This failed 4 prod builds today. Please note, removing the optimization actually improves our situation:
1. The case this optimization is meant to optimize will be caught with the call to AllowPersistentPutElisionAsync, which is lock-free lookup in memory and so a _priori_ should be quite fast.
2. Calling AllowPersistentPutElision is actually what we truly want to achieve (avoiding the upload if we know it's already in storage).
3. The removal of this heuristic means we no longer need to ensure there's a unique cache location for every build. We can always re-use the same cache and the ephemeral layer won't give incorrect results.
2024-01-31 21:23:57 +00:00
Juan Carlos Guzman Islas 0308565c8a Merged PR 764725: Allows for blob cache authentication using preauthenticated URIs
In the most recent scenarios, we are giving the engines a list of URIs which include the SAS token. We should support this scenario without requiring them to parse anything.
2024-01-31 21:14:01 +00:00
Julian Bayardo 19c0dc4652 Merged PR 764714: Remove unnecessary logging when using Blob-only cache 2024-01-31 21:08:24 +00:00
Juan Carlos Guzman Islas ce064dce7a Merged PR 764457: Make sure blob cache creation blows up if a container failed to be created
Make sure blob cache creation blows up if a container failed to be created. We need to allow for the `CreateIfnotExists` call to fail in case we only have read-only permissions. However, if it failed and the container in fact does not exist, the cache should blow up. Hopes are that the engines will be able to handle this error and proceed without the remote cache in dev builds.
2024-01-31 20:03:25 +00:00
Oleksii Kononenko b0621456a9 Merged PR 761147: Enable net7 by default
Enable net7 by default

Related work items: #2119665
2024-01-31 19:49:05 +00:00
Semih Okur 1fbdad010a Merged PR 764676: Catch TaskCancelledException for stopping unknown dynamic workers
Catch TaskCancelledException for stopping unknown dynamic workers
2024-01-31 19:44:07 +00:00
Marcelo Lynch 🧉 02dbf47fa7 Merged PR 764188: Change arguments in bxl.ps1 to use new runtime cachemiss mode on ADO
Change arguments in bxl.ps1 to use new runtime cachemiss mode on ADO
2024-01-31 17:21:20 +00:00
Serge Mera 429ce72258 Merged PR 764250: Nuget 'runtimes' folder should take precedence over 'lib' or 'ref'
Nuget 'runtimes' folder should take precedence over 'lib' or 'ref'. Check https://learn.microsoft.com/en-us/nuget/create-packages/supporting-multiple-target-frameworks#Architecture-specific-folders.

Plus some packages specify the runtime architecture as 'win', which we should map to 'win-x64'.
2024-01-31 00:17:52 +00:00
Semih Okur 9306cfcc68 Merged PR 763281: Refactoring attachment logic 2024-01-30 22:20:37 +00:00
Julian Bayardo 7b4f5a9cf5 Merged PR 764397: Drastically reduce the amount of logging produced by gRPC 2024-01-30 19:42:02 +00:00
Marcelo Lynch 🧉 d7c5cb5f04 Merged PR 764217: Fix logic in WorkerService.SayHelloAsync
Fix logic in WorkerService.SayHelloAsync
2024-01-30 19:38:33 +00:00
TSEBuild 2535cde5f0 Merge pull request 764173 from dev/BuildXL/updatepackages-2024-01-29T22-13-19 into main 2024-01-30 00:03:00 +00:00
Julian Bayardo b1196a8a21 Merged PR 763666: Make gRPC clients use a single underlying connection pool
This PR:
- Makes gRPC clients use a single connection pool
- Eliminates a lot of gRPC options being passed around that aren't actually used or plumbed
- Simplifies the gRPC configuration to follow similar settings as AnyBuild does
- Gets us one step closer to enabling gRPC encryption by always enabling server-side encryption on a separate port, and allowing usage of grpcs://, https:// etc in MachineLocation to denote HTTPS connections.
- Makes all instances actually try to enable gRPC encryption on the server side. If it fails it fails, but we try

Validation: https://cloudbuild.microsoft.com/user/jubayard_20240126152912
2024-01-29 23:54:03 +00:00
Michael Pysson 7af95f3d7a Merged PR 763933: Remove dead code related to viewer
Remove dead code related to viewer
2024-01-29 23:51:09 +00:00
Pasindu Gunasekara 🍣 e0d3fe42fc Merged PR 764123: Use net8 for dependency update pipeline
Domino.ReleaseManagement also got moved to net8 recently
2024-01-29 22:06:03 +00:00
CloudDev Build Account c04654ab23 New LKG 0.1.0-20240126.13
[skip ci]
2024-01-29 20:41:40 +00:00
Sahiti Chandramouli b03547ba46 Merged PR 763688: Obtain TCP connections in the machine
Get tcp active ports(active and listening)
Used IPGlobalProperties package to use GetActiveTcpConnections() and GetActiveTcpListeners() to obtain the required information.
Ran the primary and linux validations to check for the status.csv file. Look for TCPActivePorts
Windows - https://cloudbuild.microsoft.com/build?id=600df07e-bbfa-22b1-1a87-0b9ef88a20c4&bq=BuildXL_Internal_PR
[BuildXL.status (9) (2).csv](https://dev.azure.com/mseng/9ed2c125-1cd5-4a17-886b-9d267f3a5fab/_apis/git/repositories/50d331c7-ea65-45eb-833f-0303c6c2387e/pullRequests/763688/attachments/BuildXL.status%20%289%29%20%282%29.csv)

https://cloudbuild.microsoft.com/build?id=aad2ccb5-679a-42a9-d04a-104cdff2f16b&bq=BuildXL_Internal_PR
[BuildXL.status (10).csv](https://dev.azure.com/mseng/9ed2c125-1cd5-4a17-886b-9d267f3a5fab/_apis/git/repositories/50d331c7-ea65-45eb-833f-0303c6c2387e/pullRequests/763688/attachments/BuildXL.status%20%2810%29.csv)

Linux -  https://dev.azure.com/mseng/Domino/_build/results?buildId=25923717&view=logs&j=e7020543-e8ed-5e17-1ad4-e7bb89160f58&t=a127a243-c15c-5e7c-8982-d28f3304a152
[BuildXL.status (11).csv](https://dev.azure.com/mseng/9ed2c125-1cd5-4a17-886b-9d267f3a5fab/_apis/git/repositories/50d331c7-ea65-45eb-833f-0303c6c2387e/pullRequests/763688/attachments/BuildXL.status%20%2811%29.csv)
2024-01-27 02:49:27 +00:00
Serge Mera 51dc4b154c Merged PR 763946: Enable sending cache logs to kusto for the bxl selfhost
Enable sending cache logs to kusto for the bxl selfhost
2024-01-27 02:03:54 +00:00
Marcelo Lynch 🧉 f7ef843978 Merged PR 763809: Allow unobserved 'HTTP/2 protocol error' exceptions
Related work items: #2142528
2024-01-27 00:20:37 +00:00
Michael Pysson d7479d35c7 Merged PR 763887: Update Microsoft.IdentityModel.* packages to address CG alert
Update Microsoft.IdentityModel.* packages to address CG alert

Related work items: #2138377, #2138378, #2138379, #2138380, #2138381, #2138382, #2138383, #2138384
2024-01-26 23:11:25 +00:00
Pasindu Gunasekara 🍣 3e3c720b92 Merged PR 763891: Update gitignore
- This file gets picked up on Linux because it's case sensitive.
2024-01-26 23:07:50 +00:00
Michael Pysson f2810077bb Merged PR 763855: Reenable disabled unit tests
These tests were disabled years ago and pass locally. Let's see if they are still problematic.
2024-01-26 22:30:55 +00:00
Julian Bayardo 14ad21d64c Merged PR 763375: Remove unused Kusto logger in ContentStoreApp
This PR removes an usused feature that's also pulling quite a few nuget pkgs in
2024-01-26 22:05:03 +00:00
Pasindu Gunasekara 🍣 a70deaaf3d Merged PR 763860: Temporarily hardcode MacOS runtime package dependency
- Making it a bit harder to build test packages locally since you need a mac to build the runtime package.
- Will revert this once I have a better fix.
2024-01-26 22:00:55 +00:00
Serge Mera 76e437d5f9 Merged PR 763374: Enable uploading cache logs to Kusto
Reuses the NLog component already available for CASaaS. The schema of the kusto table is preserved to make existing kusto queries compatible, even though many of the columns are not meaningful outside of CloudBuild.
For the time being, adding an explicit flag to turn it on for deployment purposes, but eventually the existing /logsToKusto flag will also turn on cache logging.

Instructions/scripts for setting up the ingestion/table are coming afterwards.
2024-01-26 19:28:46 +00:00
Serge Mera 91f5900aaf Merged PR 763807: Display manifest content when CG manifest validation fails
Display manifest content when CG manifest validation fails
2024-01-26 19:28:19 +00:00
Julian Bayardo c51ac35c45 Merged PR 757744: [Ephemeral] Track Azure locations within the content tracker
The ephemeral cache currently assumes that something exists in Azure Storage if it exists in the content tracker (distributed, accessible to all machines) OR if it's in the put elision cache (local to the machine). This causes a series of things to happen:
1. The number of storage accesses is higher than it should be in cases where the content is added in one machine, evicted, and then accessed from another. This scenario is relatively common because the local cache size we use for ephemeral builds is small as we have small drives.
2. A race condition is possible (see deleted comments); it's prevented by waiting, but the cost of that is more storage accesses.
3. In the datacenter wide case, it also means that the accessing of Azure Storage is less efficient than it could be, because the knowledge about storage accesses isn't shared across builds.

This PR adds Azure Storage as a location and modifies things so they take that extra information into consideration.

Example builds: https://cloudbuild.microsoft.com/user/jubayard_20231215164957
Further example builds: https://cloudbuild.microsoft.com/user/jubayard_20240125180851

Related work items: #2122872
2024-01-26 18:32:11 +00:00
Iman Narasamdya cf8690b00b Merged PR 763662: Document support for file streams
Document support for file streams

Related work items: #2136782
2024-01-26 01:00:19 +00:00
Iman Narasamdya 4dc47bed49 Merged PR 763628: [Detours] Fix path canonicalization for near-root paths
For file access reports, Detours in general normalize/canonicalize the paths by calling `GetFullPath`.

However, there is one case where Detours does not do that, i.e., the case where it removes path last segment.

Consider the following API call, `FindFirstFile("B:\\*.cpp")`. This is a call to start enumerating `B:\` with `*.cpp` pattern. Detours needs to report enumerate access on `B:\`. To do so, it has to remove `*.cpp` from the given input. The last segment removal was so naive that it simply removes everything after, and including, the last `\`. This results in Detours reporting `B:`.

The fix is to compare the path with the path root. If the position of last `\` is within the root's length, then we should take the whole root instead.

Related work items: #2126559
2024-01-25 23:57:54 +00:00
Pasindu Gunasekara 🍣 a08dc2be74 Merged PR 763521: Fix VS solution generation on Linux
- Fixes OS check for csproj files
- Updates logic for finding xunit processes
2024-01-25 21:43:03 +00:00
CloudDev Build Account 5037c81b20 New LKG 0.1.0-20240124.1
[skip ci]
2024-01-25 17:31:56 +00:00
Iman Narasamdya 13f91f7773 Merged PR 763373: Make codex analysis run again by tracking user build in RunCheckinTest
Make codex analysis run again by tracking user build in RunCheckinTest

Related work items: #2142877
2024-01-25 03:13:15 +00:00
Marcelo Lynch 🧉 7f632b7531 Merged PR 763371: Update release notes
[skip ci]
2024-01-25 02:12:40 +00:00
Iman Narasamdya 87e6871878 Merged PR 763122: Include current fingerprint salt when calculating static fingerprint during graph construction from graph fragments
The bug is caused by the following scenario:

Consider a scenario where a build session consists of 2 builds, one consisting pips for constructing pip graph fragments, and the other stitches the fragments together and executes the pips in the fragments. In the first build session, the 1st build has a pip that constructs a pip P in a graph fragment G with a fingerprint salt A. The fragment is then included in the 2nd build that executes P.

In the next build session, the first build keeps using the fingerprint salt A for constructing the graph fragment G. Thus, in that build, the construction of G has a cache hit. In the second build, one changes the fingerprint salt to B, and expects P to have a cache miss. In this build we will have a graph cache miss because we have a different fingerprint salt B. However, during the graph construction, we still serialize the graph fragments generated in the first build. If we do not include the fingerprint salt B when calculating the pip's static weak fingerprint (when stitching the fragments together), then P will have the same static weak fingerprint as the one in the first build session. This can result in a hit in the incremental scheduling state and make P get skipped during the execution phase, and thus underbuild.

Related work items: #2137467
2024-01-25 00:02:14 +00:00
Julian Bayardo 52c1d6611a Merged PR 763290: Prevent ephemeral tracking bug when engine attempts to insert a mismatched file
This PR protects against the following bug:

- QuickBuild tries to Put a file, but hands us a file that doesn't match the hash quickbuild expects
- We fail to insert it into the local cache, but because we had a previous target that had the scenario above and the target contained the hash that we expected to put, the file is considered to be alive in storage and therefore gets a put elision (i.e., we don't even try calling storage), which means we return that the put was successful!
- QuickBuild then assumes the file is successfully in the cache, and proceeds with life happily.
- Eventually, it adds a fingerprint which references the file that doesn't actually exist!
2024-01-24 23:24:15 +00:00
Pasindu Gunasekara 🍣 5a73d2aba3 Merged PR 763257: Set PAT before trying to download drop on publish step in rolling pipeline 2024-01-24 18:45:28 +00:00
Qi Wang 8c0dde94c1 Merged PR 759642: Environment variable value should be case sensitive
Make environment variables value case sensitive when compute cache fingerprint.
Currently we ignore the case when comparing environment variables values in InputTracker and CacheGraphProvider. Office team has some builds failed because of this.

Related work items: #2130263
2024-01-24 00:48:09 +00:00
Pasindu Gunasekara 🍣 b13dd0c5db Merged PR 763090: Fix VS solution generation for Linux
- "old" solution generator has some out of date checks for csc pips
2024-01-23 23:16:41 +00:00
Pasindu Gunasekara 🍣 b9f9405fad Merged PR 762760: Interpose __xmknod and __xmknodat
- This was obsoleted in glibc 2.33, but we still need to support it for ubuntu 20.04 which uses glibc 2.31 by default.
- The regular mknod/mknodat calls don't get interposed for whatever reason, even though the symbols exist in glibc < 2.33

Related work items: #2132116
2024-01-23 17:41:56 +00:00
CloudDev Build Account e751d4f5f6 New LKG 0.1.0-20240121.1.1
[skip ci]
2024-01-23 02:15:29 +00:00
Iman Narasamdya d10472ab05 Merged PR 762804: ETW trace logging for Detours services
ETW trace logging for Detours services.

Changes in this branch are ported/adapted from @<Ben Witman> 's branch.

Related work items: #2142165
2024-01-23 01:50:15 +00:00
Sahiti Chandramouli b6c76596c5 Merged PR 761819: Handle CacheInitializationtask failure for dev builds
GetCacheInitializationTask method returns a task, result(failure if present) for a distributed build are handled instantly. But in case of dev builds we handle them during GraphReuse and ConstructingSchedule.
Depending on the stack trace, the issue occurs when the initialization of cache has failed, but the ConstructSchedule method has decided to proceed with new graph and the method returns successfully.
But an error is logged if there is any failure in the initialization of the cache. This causes the exception we see as it fails the assertions in ValidateSuccessMatches method.

Related work items: #2138736
2024-01-23 01:23:59 +00:00
Semih Okur da2dcf75bb Merged PR 762780: Fix NRE when disposing uninitialized workers
Fix NRE when disposing uninitialized workers
2024-01-23 00:25:23 +00:00
Marcelo Lynch 🧉 00cc860537 Merged PR 761892: Use ADO environment variables to infer cache miss analysis keys based on branches related to the build
Retrieve contextual information about the branch the build is running from the Azure DevOps environment variables, and use those values as keys. We consider two scenarios: builds running from a PR trigger and builds that don't.

For builds running on a PR, these are the candidate keys we use:
1. The PR merge-branch name (e.g., /refs/pull/1234/merge)
2. The source branch for the PR (e.g., /refs/dev/chelo/myFeatureBranch)
3. The target branch for the PR (e.g., /refs/heads/main)

The rationale is that we want to get the latest fingerprint store pushed by other iterations from 'this' PR, which will share the PR branch name, we fall back to the closest branches that might have been built before in a non-PR build (and thus pushed a fingerprint store using their branch names).

For builds triggered outside or PRs, we just use the branch name, which is BUILD_SOURCEBRANCH. The other variables related to PRs are undefined in this context.

This strategy is effective assuming the target branches are built regularly and that FP stores from those branches will become accessible from the PR builds (i.e., with 'baseline builds' running in branches that are targeted from PRs, and in the same cache namespace that the PRs).
2024-01-23 00:19:18 +00:00