This change modifies how we touch files when doing uploads of files that already exist or pins. Currently, we'll update the last access time by doing a 1-byte download.
In the new version, we instead update the Content-Language to a random value. The reason this works is that updating Content-Language updates the Last Modification Time, the ETag, and the Last Access Time. Because this only works when we have write permissions, we do this only for non-read-only sessions.
This also changes GC so that when it reads the last access time from a blob, it also fetches the ETag and the last modification time.
One of the interesting consequences of this PR is that this in principle allows us to disable Last Access Time tracking in our storage accounts that use server GC.
The reason this change is being made is we've had already 6 WDG build failures where:
1. A content is added long ago
2. A build runs, it inserts the content, the content is already there, so we touch it instead
3. In parallel, GC is running and sees the last access time from before the touch, so it proceeds to delete it. Because we had no way to prevent the deletion from happening when there's a race, this would cause a CTMIS
Reverts !785193
#Issue
- office needs to specify client keys for index source in symbol drop. Since same file path will be indexed many times in different build, to avoid the key collision, we will need to specify the client key to be unique enough. However, this introduces unwanted cache miss to build if we put it as argument or environment to build graph.
- can't pass custom client key for each file under directory when calling `addDirectoryToSymbol`
- There is no filer to filter out files when calling `addDirectoryToSymbol`
#solution
- new argument `clientKeyPrefixForSymbolEnv` for symbol daemon, to specify the env name and env's value will be used for prepending to the passed custom client key before sending to symbol drop. The values is constantly changing overbuild, but it is only passed to ipc pip (always run) and doesn't impact non-ipc pip. This is best way that I could think of to avoid uncessary cache miss
- new argument `directoryContentFilter` for `addDirectoryToSymbol` to filter the files under the directory to be sent to symboldrop
- new argument `directoryPathToClientKey` for `addDirectoryToSymbol` to specify the client key by replacing the directory path with this value and each file's client key under that directory
will be composed as: `<directoryPathToClientKey>/<relative path under the directory>`
We want to log some infrastructure issues as verbose to not surface them to the users, but at the same time keep track of them: for this, we include them in the internal warnings
Changes to make CodeQL happy. The majority of the changes are suppression of CodeQL warnings:
1. eval/function constructions - we use these exclusively in tests
2. weak cryptography - we are not using it for the crypto purposes
Related work items: #2152932
There are three failures.
1. Operation is not using OS utilities for probe operation in test.
2. cp command in Ubuntu 22.04 reports read access before probe access, which is weird. The read access will be cached, and sandbox will skip the following probe access on same path. Add a version check before assert the event log id count.
3. fstat is used in Ubuntu 22.04 instead of __fxstat. Assert specific system call based on ubuntu version.
Related work items: #2182474
This change modifies how we touch files when doing uploads of files that already exist or pins. Currently, we'll update the last access time by doing a 1-byte download.
In the new version, we instead update the Content-Language to a random value. The reason this works is that updating Content-Language updates the Last Modification Time, the ETag, and the Last Access Time. Because this only works when we have write permissions, we do this only for non-read-only sessions.
This also changes GC so that when it reads the last access time from a blob, it also fetches the ETag and the last modification time.
One of the interesting consequences of this PR is that this in principle allows us to disable Last Access Time tracking in our storage accounts that use server GC.
The reason this change is being made is we've had already 6 WDG build failures where:
1. A content is added long ago
2. A build runs, it inserts the content, the content is already there, so we touch it instead
3. In parallel, GC is running and sees the last access time from before the touch, so it proceeds to delete it. Because we had no way to prevent the deletion from happening when there's a race, this would cause a CTMIS
Potentially causing incident 507986920
Revert "Merged PR 784626: Allowlist rules should not be affected by '\\?\' path prefix"
This reverts commit c581dcee03.
Related work items: #2165593
Use System.DefaultWorkingDirectory for downloading artifacts in publish symbols instead of BuildXL.ExternalLKGsPackagePath, which is a variable from release pipeline and need to be passed through queue time variables.
CaSaaS is currently on Net7 and it might need updated bxl bits while it's moving to Net8. This PR re-adds net7 qualifier. It's essentially a revert of `0a4d5b272900cfb6ad2c13795650e269500f9d17` and `5380d614bb98bfebb5a63900bb617ff51e93e56e`.
7.0.18 has a vulnerability, so this PR also includes a change that updates net7 bits to 7.0.19 to make the CG happy.
note: MsBuild is back to net7 (net7->net8 for msbuild was a part of net7 removal)
Untrack /user/local/include and /usr/x86_64-linux-gnu by default
Report Checker::Lookup for absent path in readlink
Fix a small bug when pathname is empty, we added extra slash at the end of the path
Related work items: #2137346
We should only use `real_` when we are very sure that it's okay to modify errno (for example, during sending we know that we will restore errno at the end) or if we want to examine `errno` of that operation ourselves (but we don't have instances of this in the codebase today). Let's try to use `internal_`, which automatically restores errno, whenever in doubt
Related work items: #2180657
Use queue time variable for rolling build number and id.
When scheduling pipeline run from release pipeline, the build id and number used to download artifact from rolling build need to be passed though queue time variables.
In a pipeline with two "kinds" of builds (say, "debug" and "ship"), it is better to split the FP store namespace between these two builds: this makes for a better comparison, always against the latest FP store of a build of that same "kind"
Revert "Merged PR 785648: Enable incremental binary analysis in 1ESPT for rolling build
Enable incremental binary analysis in 1ESPT for rolling build"
Reverted commit `221bb40f`.
Looks like we only care about O_NOFOLLOW and we hardcode it ourselves most of the time, we are not really using whatever 'mode' is specified in the different system calls. So instead of making it part of the 'event' and using the OS const values (like NOFOLLOW) we should just use an enum
Related work items: #2171798