зеркало из https://github.com/microsoft/BuildXL.git
Merged PR 763662: Document support for file streams
Document support for file streams Related work items: #2136782
This commit is contained in:
Родитель
4dc47bed49
Коммит
cf8690b00b
|
@ -68,6 +68,7 @@ This is the primary documentation for BuildXL (Microsoft Build Accelerator). If
|
|||
* [Server Mode](Wiki/Advanced-Features/Server-Mode.md)
|
||||
* [Source Change Affected Inputs](Wiki/Advanced-Features/Source-Change-Affected-Inputs.md)
|
||||
* [Symlinks and Junctions](Wiki/Advanced-Features/Symlinks-and-Junctions.md)
|
||||
* [File streams](Wiki/Advanced-Features/File-Streams.md)
|
||||
* [Timestamp Faking](Wiki/Advanced-Features/Timestamp-Faking.md)
|
||||
* [Unsafe Flags](Wiki/How-To-Run-BuildXL/Unsafe-flags.md)
|
||||
* [BuildXL Command line Flags](Wiki/Flags.md)
|
||||
|
|
|
@ -6,7 +6,7 @@ Consumption is divided into 2 categories:
|
|||
1. Child Processes (Pips)
|
||||
|
||||
|
||||
### Passing variables to BuildXL
|
||||
## Passing variables to BuildXL
|
||||
Environment variables can be set prior to launching bxl.exe or be specified on the command-line using the `/p` option:
|
||||
|
||||
`bxl.exe /p:x=1`
|
||||
|
|
|
@ -1,4 +1,6 @@
|
|||
## Background
|
||||
# Cache miss analysis
|
||||
|
||||
## Cache lookup
|
||||
A simplified explanation of BuildXL's cache lookup algorithm is:
|
||||
1. Collect inputs of a process pips. This can be file content, whether files exist, directory enumeration listings, process command lines, environment variables, the identity of the process itself (e.g. the hash of the executable), and other BuildXL settings that influence how a process is run (e.g. which file accesses are untracked).
|
||||
1. Create a "fingerprint" which is a hash of all of those inputs and query the cache to see if it has a record from a previous invocation.
|
||||
|
|
|
@ -1,3 +1,7 @@
|
|||
# Execution analyzer
|
||||
|
||||
# Analysis tool
|
||||
|
||||
bxlAnalayzer.exe is a tool that lives next to bxl.exe. It can provide analysis of the execution of a build and its graph. It operates on execution log files (.xlg), which live in BuildXL's log directory.
|
||||
|
||||
The analyzer works on both on Windows and macOS (unless otherwise specified). For macOS, remove the ".exe" extension from the examples and correct the paths.
|
||||
|
|
|
@ -0,0 +1,29 @@
|
|||
# File stream support
|
||||
|
||||
In the NTFS file system, streams are data (a sequence of byte) that is written to a file, and this data gives extra information about the file than attributes and properties. For details about streams, see [NTFS file streams](https://learn.microsoft.com/en-us/windows/win32/fileio/file-streams).
|
||||
|
||||
Currently, BuildXL does not support file stream accesses in the following sense: BuildXL can execute a process pips that accesses file streams, but
|
||||
BuildXL may fail with access violations because streams are not representable in Detours' file access manifest, or will drop reports of
|
||||
stream accesses altogether.
|
||||
|
||||
## Detours' logic for file stream accesses
|
||||
|
||||
Detours does not give a special treatment to default streams. However, because paths with streams are not representable in BuildXL's
|
||||
internal path structure, such paths cannot be included into Detours' file access manifest, and thus, such accesses can result in access denied.
|
||||
For example, an access to `file::$DATA` results in access denied.
|
||||
|
||||
On the other hand, Detours always allows accesses to files with non-default streams. For example, accesses to `file:test_stream:$Data` and `file:test_stream` are always allowed without checking the manifest.
|
||||
|
||||
## Pip executor's logic for file stream accesses
|
||||
|
||||
Even though Detours does not give a special treatment to the default stream, the pip executor will drop any reported file access with path
|
||||
that cannot be parsed by BuildXL's internal path structure, including filenames with streams.
|
||||
|
||||
Moreover, the special streams are not going to be transited through the cache, and are not taken into account when hashing the file.
|
||||
|
||||
## Supporting file streams
|
||||
|
||||
BuildXL currently does not support file streams. To properly handle streams, there are a lot of changes needed, from the cache, our internal file
|
||||
info (and file materialization info) structure, down to the Detours level.
|
||||
|
||||
In the near future, we may want to support default streams because, for example, accessing `file::$DATA` is the same as accessing `file`.
|
|
@ -1,9 +1,11 @@
|
|||
# Pip graph reuse
|
||||
|
||||
BuildXL constructs pip graphs by parsing and evaluating the DScript specifications (or specs). When there are a lot of specs, the graph construction can take some time.
|
||||
Luckily, BuildXL has a capability of reusing pip graphs that were constructed in the previous builds. To allow graph reuse, users need to ensure that the inputs used
|
||||
to construct the pip graph are stable. These inputs include the evaluated spec files, enumerated directories, the environment variables queried (read/probed) by the spec files during
|
||||
evaluation, and the mounts specified in the build.
|
||||
|
||||
# Pip graph (weak) fingerprint
|
||||
## Pip graph (weak) fingerprint
|
||||
|
||||
One key used by BuildXL to identify a pip graph is a so-called *pip graph weak fingerprint*. If the fingerprint changes, then the pip graph cannot be reused.
|
||||
|
||||
|
@ -20,7 +22,7 @@ These elements can affect the pip graph being constructed. For example, if the l
|
|||
can change. Similarly for subst target. For example, if the subst target changes from `B:` to `A:`, then all paths in the graph produced during evaluation will
|
||||
start with `A:` instead of `B:`.
|
||||
|
||||
# Inputs for constructing pip graphs
|
||||
## Inputs for constructing pip graphs
|
||||
|
||||
When BuildXL evaluates the specs to construct a pip graph, it keeps track of elements that affect the shape and content of the constructed pip graph. Those elements are
|
||||
- the evaluated spec files (or any read file) and their content hashes,
|
||||
|
@ -36,7 +38,7 @@ Also, during evaluation, the spec can call `Environment.getStringValue(...)` or
|
|||
if `Enviroment.hasVariable("DROP_ENABLED")` returns true, then the pip graph will have extra drop pips. The result of `Environment.getStringValue("CXX_FLAGS")` can
|
||||
be embedded in the arguments of some pips, and thus changing the value of `%CXX_FLAGS%` affect the constructed pips. Similarly for mounts: the spec can call `Context.getMount(...)` whose result may be used in some pip specification.
|
||||
|
||||
# Engine cache vs. real cache
|
||||
## Engine cache vs. real cache
|
||||
|
||||
The reused pip graph can come from the engine cache, a folder typically set in the object or output directory, or from the real cache.
|
||||
The former stores only the last recently constructed graph, while the latter can have many pip graphs from previous builds. Because it is cheap to check
|
||||
|
@ -50,16 +52,16 @@ for checking the pip graph inputs, i.e., besides tracking the content hashes of
|
|||
If the pip graph in the engine cache cannot be reused, then BuildXL will consult the real cache. The pip graph weak fingerprint and the hash of the pip graph inputs
|
||||
become the key of the cache look up.
|
||||
|
||||
# Evaluation filter
|
||||
## Evaluation filter
|
||||
|
||||
Users may pass different evaluation filters when constructing the pip graph. The evaluation filters only make the constructed pip graph a subgraph of the one that is constructed
|
||||
without any evaluation filter. Thus, the latter graph is actually reusable when the evaluation filters are applied.
|
||||
|
||||
# Making pip graph caching work
|
||||
## Making pip graph caching work
|
||||
|
||||
To have pip graph cache hit, users need to make the pip graph weak fingerprint and the pip graph inputs stable build over build. Here are some guidelines to make the graph inputs stable.
|
||||
|
||||
## Do not query constantly changing environment variables
|
||||
### Do not query constantly changing environment variables
|
||||
|
||||
Typical case of using constantly changing environment variable is using an environment variable to keep track of build id. For example,
|
||||
```
|
||||
|
@ -74,7 +76,7 @@ set BUILD_ID=Build_%date:~7,2%_%date:~4,2%_%date:~10,4%_%time:~0,2%_%time:~3,2%
|
|||
bxl /c:config.dsc ...
|
||||
```
|
||||
|
||||
## Use BuildXL's log folder mount for current build's log
|
||||
### Use BuildXL's log folder mount for current build's log
|
||||
|
||||
Related to the constantly changing environment variable, users typically invent their own log folders based on the current time. For example,
|
||||
```
|
||||
|
@ -102,7 +104,7 @@ Transformer.execute({
|
|||
});
|
||||
```
|
||||
|
||||
## Enable user-profile redirection
|
||||
### Enable user-profile redirection
|
||||
|
||||
For graph caching across machines, do not query or embed user profile or domain in the pip specification without enabling user-profile redirection.
|
||||
For example,
|
||||
|
|
|
@ -1,3 +1,5 @@
|
|||
# Incremental scheduling
|
||||
|
||||
Incremental scheduling in BuildXL is a feature that allows BuildXL to avoid processing pips based on the filesystem's change (or USN) journal records tracked from the previous builds. Incremental scheduling can also be viewed as a technique to prune the pip graph on up-to-date pips. Incremental scheduling can be enabled by passing `/incrementalScheduling+`.
|
||||
|
||||
## Pip processing
|
||||
|
@ -20,7 +22,7 @@ Suppose that we have the following pip dependency graph:
|
|||
Assume that our cache is empty at the beginning. The first build of the above graph, which is a clean build, performs steps 1, 3, 4 for every process pip in the graph.
|
||||
Now, before we do the next build, we modify `fileD`. Without incremental scheduling, in the next build, BuildXL performs steps 1, 3, 4 for `Process3` and `Process4` (assuming `fileE` changes after `Process3` execution), and performs step 1 and 2 for `Process1` and `Process2`. That is, without incremental scheduling, BuildXL computes pip fingerprints and performs cache look-up's, even though the outputs of `Process1` and `Process2` exist and do not change from the previous build.
|
||||
|
||||
## Incremental scheduling
|
||||
## How it works?
|
||||
With incremental scheduling, during the build, BuildXL tracks the filesystem journal USN records of input and output files. When a file is modified, BuildXL is able to compare the recorded USN with the current one, and if the USNs are different, then BuildXL marks the consuming/producing pips dirty, i.e., the pips need to be processed. (Invariant: If a pip is marked dirty, all its transitive dependents are marked dirty.)
|
||||
|
||||
In the above example, BuildXL is able to mark that `Process1` and `Process2` are clean, and thus it can avoid scheduling those process pips. In essence, BuildXL will only see the following pip graph:
|
||||
|
@ -53,7 +55,7 @@ Incremental scheduling is incompatible with distributed builds because pips buil
|
|||
|
||||
BuildXL disables incremental scheduling completely when distributed build is requested.
|
||||
|
||||
## Uncachable allowlist
|
||||
### Uncachable allowlist
|
||||
BuildXL allows users to specify files in the configuration file that are allowlisted on checking access violation. Some pips may produce or consume files in that allowlist. If those files are in so-called uncacheable allowlist, then those pips will not be cached, and so they are expected to be executed in every build. For example, suppose that `fileX` below is in the uncacheable
|
||||
allowlist, (e.g. `Process42` outputs its date/time of execution to `fileX`):
|
||||
```
|
||||
|
@ -65,10 +67,7 @@ allowlist, (e.g. `Process42` outputs its date/time of execution to `fileX`):
|
|||
|
||||
Incremental scheduling marks such a pip perpetually dirty, and so BuildXL will keep processing the pip. This essentially makes incremental scheduling feature sub-optimal.
|
||||
|
||||
## Anti-dependency
|
||||
Anti-dependencies are caused by probing non-existent files, such as a search for a C++ header file in an ordered list of search directories. BuildXL observes anti-dependencies during pip executions. However, incremental scheduling does not use the result of such observations. Currently, when a user introduces a file that was probed absent in the prior builds, then incremental scheduling will simply assume that all nodes in the graph are dirty.
|
||||
|
||||
## Lazy output materialization
|
||||
### Lazy output materialization
|
||||
BuildXL has a feature that lazily materializes outputs when a pip can be run from the cache (`/enableLazyOutputs`). Let's consider the following example:
|
||||
```
|
||||
Process1 <-- fileX <-- Process2 <-- fileY <-- Process3
|
||||
|
@ -81,18 +80,18 @@ In the above scenario, incremental scheduling will mark `Process2` and `Process3
|
|||
|
||||
If we do another build without changing anything, then at the beginning of the build, before processing the pips, incremental scheduling will change the marker of `Process1` to dirty, which in turn makes `Process2` and `Process3` dirty as well. Thus, in this case incremental scheduling cannot prune the graph. Suppose that incremental scheduling did not change the marker of `Process1` to dirty. If now the user modifies `fileZ` and requests to build `Process2`, then `Process2` needs to execute, but since `Process1` is clean, `Process2` simply assumes that `fileX` is on disk, but it is not. This is the rationale of having clean and materialized markers for pips.
|
||||
|
||||
## Shared opaque directory
|
||||
### Shared opaque directory
|
||||
Process pips can produce output directories. The content of those directories are unknown until the pips produce them. The output directories can be consumed directly by pips, without the consuming pips know the contents of the directories. Such output directories can also be shared by more than one process pip. Such directories are called shared opaque directories.
|
||||
|
||||
For correctness, the contents of shared opaque directories are deleted before BuildXL begins pip processing. Thus, in principle, pips that produce shared opaque directories need to always be processed. Incremental scheduling marks such pips perpetually dirty.
|
||||
|
||||
## Unflushed page cache
|
||||
### Unflushed page cache
|
||||
Incremental scheduling tracks input and output files by recording their USN records. To get a stable record for an output file, BuildXL flushes the page cache to the file-system.
|
||||
Unfortunately, on spinning disks, page cache flushes can be expensive. If we turn off page cache flush, then in the next build BuildXL detects that the USN records of some output files may have changed because the OS flushed the page cache after the output file was tracked. Thus, BuildXL marks the pips that produced those output files dirty.
|
||||
|
||||
To keep flushing page cache on tracking or storing outputs to cache, one can pass `/flushPageCacheToFileSystemOnStoringOutputsToCache+`.
|
||||
|
||||
## Drops
|
||||
### Drops
|
||||
BuildXL has a distinct kind of pip, called an IPC pip, that is used to drop files to some artifact store. If a user modifies some input files, and re-runs the build that drops the produced output files, then we have the following issues:
|
||||
|
||||
1. If the drop name used by the current build is the same as that of the previous builds, then the file cannot be dropped because the previous build has already finalized the drop name.
|
||||
|
|
|
@ -1,4 +1,6 @@
|
|||
## Concept of weight
|
||||
# Pip weight
|
||||
|
||||
## Concept
|
||||
BuildXL has a limited number of process slots that can be used for processes execution. This number is indicated by the argument /maxproc when running BuildXL. "Weight" is a concept that describes how resource-heavy a process is. It specifies how many process slots it requires to execute. The total "weight" of all processes running concurrently must be less than the number of available process slots. "Weight" defaults to 1, converts values < 1 to 1, and considers values >= available process slots to mean the process should run alone.
|
||||
|
||||
Weight can be statically configured as the examples below describe or use a dynamic value based on historical performance information. This historic information looks at the number of CPU clock cycles used compared to a process pip's wall clock runtime. If it utilized 3 CPU cores over its wall clock runtime window, a value of 3 will be used. Accuracy is impacted by how the operating allocates CPU time for the process.
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
# Plugin Mode
|
||||
# Plugin mode
|
||||
|
||||
Plugin mode is a way of providing the extensibilty of changing the default behavior in BuildXL. Plugin is running in a separate process. BuildXL commnuicate with plugin over grpc thus user can define rpc methods for their own plugins.
|
||||
|
||||
|
@ -10,7 +10,7 @@ Plugin mode is a way of providing the extensibilty of changing the default behav
|
|||
* BuildXL will choose the first plugin that can handle the request
|
||||
* BuildXL will have plugin client to commnuicate with each plugin over grpc(one client per plugin)
|
||||
|
||||
## Required Operations
|
||||
## Required operations
|
||||
Plugin implementation should conform a set of rpc operations:
|
||||
1. `Start`: instruct plugin to start and load any necessary resources
|
||||
1. `Stop`: instruct plugin to stop and clean up
|
||||
|
|
|
@ -1,4 +1,7 @@
|
|||
## BuildXL Process Timeouts
|
||||
# BuildXL process timeouts
|
||||
|
||||
## Process timeouts
|
||||
|
||||
Process pips launched by BuildXL have configurable timeouts.
|
||||
The overall timeout will cause a pip to be terminated and considered a failure if exceeded.
|
||||
There is also a warning timeout that will be printed as a hint to users, to indicate that they are reaching a time limit.
|
||||
|
@ -38,5 +41,5 @@ The following happens when the timeout is reached:
|
|||
|
||||
To aid in discovering when pips are at the brink of the failure threshold, there is a second warning threshold that can be configured at a global and per-pip level similar to what is described above.
|
||||
|
||||
## Other Timeouts
|
||||
## Process injection timeouts
|
||||
There is also a "process injection" timeout. This is the amount of time BuildXL allows for spawning a process and the process sandboxing injecting itself into the running process. Generally this happens on the order of milliseconds, but on a very heavily loaded computer it may take much longer. The timeout for this is set to 10 minutes but this timeout is not user configurable.
|
||||
|
|
|
@ -1,3 +1,5 @@
|
|||
# Schedule prioritization
|
||||
|
||||
BuildXL aims to prioritize the critical path of a build to get the shortest end to end build time.
|
||||
|
||||
After BuildXL constructs or loads a build graph, it makes a pass across all pips that are filtered in and uses a heuristic for how long it expects the pip to run. With no additional information, that heuristic is based on the count of declared input and output files. The expected runtime for a pip will contain the expected runtime for all downstream pips. This expected runtime becomes the priority. It is static once assigned at the beginning of the execute phase.
|
||||
|
|
|
@ -1,4 +1,5 @@
|
|||
# Introduction
|
||||
# Sealed directories
|
||||
|
||||
Sealed Directories provide a way to treat a set of files under the same directory root as a single build artifact. This feature is extremely convenient since in BuildXL all process dependencies are defined in terms of build artifacts.
|
||||
|
||||
For example, it is not uncommon that many processes depend on a set of standard header files; instead of declaring a dependency for each process on each header file, it is much less verbose, hence more manageable, to define a single sealed directory containing all the header files, and then for each process declare that it depends on the sealed directory.
|
||||
|
|
|
@ -1,6 +1,8 @@
|
|||
# Source change affected inputs
|
||||
|
||||
Source change affected input computation a feature of BuildXL that computes all the affected inputs of a process by the source change of the enlistment. BuildXL writes the full paths of the affected inputs of a process to a particular file indicated via the pip definition.
|
||||
|
||||
# Transitiveness
|
||||
## Transitiveness
|
||||
The source change impact propagates down the graph.
|
||||
|
||||
Suppose that we have the following pip dependency graph:
|
||||
|
@ -9,12 +11,13 @@ infileA <- (processA) <- outfileA <- (copyfile pip) <- outfileA-copy <- (process
|
|||
```
|
||||
When infileA was changed, the change affected input for processC is outfileB.
|
||||
|
||||
# Feature Adoption
|
||||
## Feature Adoption
|
||||
The feature is currently used for changelist code coverage. BuildXL computes the source change affected input list for QTest pip. QTest will only process the file listed as affected when it computes code coverage results. This reduces the QTest's instrumentation time.
|
||||
|
||||
# Enabling The Feature
|
||||
## Enabling The Feature
|
||||
BuildXL needs to know the source changes for the computation. BuildXL will only perform the compution for the process that requires to know its affected inputs by providing the path of a file that the result can be written into.
|
||||
## Source Change Tracking
|
||||
|
||||
### Source Change Tracking
|
||||
Currently, BuildXL doesn't check the source change of the enlistment itself. It requires source change provided through the command line argument `/inputChanges:<path-to-file-containing-change-list>`. Full paths of the changed source files should be listed in this file.
|
||||
|
||||
Example of the file content:
|
||||
|
@ -22,7 +25,7 @@ Example of the file content:
|
|||
D:\Git\MyRepo\src\fruit.cs
|
||||
D:\Git\MyRepo\src\bar.cpp
|
||||
```
|
||||
## Providing a file path for writing the computation result
|
||||
### Providing a file path for writing the computation result
|
||||
To enable the feature for a process, the spec author needs to provide a path to a file that the computation result will be written into. This file path is indicated by the `changeAffectedInputListWrittenFile` argument when calling `Transformer.execute()`.
|
||||
|
||||
Example of Dscript code:
|
||||
|
@ -43,10 +46,9 @@ Example of Dscript code:
|
|||
});
|
||||
```
|
||||
|
||||
# Caching Behavior
|
||||
## Caching Behavior
|
||||
When a process requires to know its source affected change inputs, BuildXL assumes that this process needs to use this information to do something. So it treats this computation result as an input of the process and counts it in the fingerprint. If the affected inputs of a process change, this process will get a cache miss. The process's cache behavior will be different from when the feature is disabled.
|
||||
|
||||
## Example
|
||||
Suppose that we have the following pip dependency graph
|
||||
```
|
||||
infileA <-- (processA) <-- outfileA <--- (processC) <-- outfileC
|
||||
|
|
|
@ -1,47 +1,48 @@
|
|||
# Support Summary
|
||||
# Symlinks and junctions
|
||||
|
||||
| | Windows | macOS |
|
||||
|-----------|:-----------:|-----------|
|
||||
| File symlinks | Fully supported | Fully supported |
|
||||
| Directory symlinks | Partially supported; treated like junctions. See details below | Fully supported |
|
||||
| Directory junctions | Partially supported. See details below| N/A|
|
||||
|
||||
# File Symlinks
|
||||
## File Symlinks
|
||||
BuildXL supports file symlinks. A process pip can safely (a) consume file symlinks, including the files the symlinks point to, and (b) produce file symlinks. A copy-file pip has a limited support for copying symlinks, in the sense that the symlink to be copied should only point to a "read-only" target, i.e., the target should not be produced during the build.
|
||||
|
||||
## Specification
|
||||
### Specification
|
||||
|
||||
### Input symlinks
|
||||
#### Input symlinks
|
||||
When a process pip wants to access a file via a file symlink or a chain of file symlinks, then all symlinks in the chain and the target file itself must be specified as dependencies. BuildXL enforces this requirement using its sandbox. BuildXL imposes this requirement to ease file change tracking.
|
||||
|
||||
### Output symlinks
|
||||
#### Output symlinks
|
||||
A process pip can produces a file symlink, and that file symlink may point to non-existent target.
|
||||
|
||||
### Copying symlinks using copy-file pip
|
||||
#### Copying symlinks using copy-file pip
|
||||
Copying a symlink using a copy-file pip means copying the final target file of that symlink. This semantics requires that the final target file must exist, otherwise the copy-file pip fails. Since the copy-file pip only has a single dependency, on copying a symlink, BuildXL also requires that the target file and all symlinks in the chain to the target file, except the symlink to be copied itself, must be read-only, i.e., they are not produced during the build.
|
||||
|
||||
BuildXL imposes these requirements to avoid a race condition, which can then lead to unreliable and unpredictable builds. Suppose that BuildXL allowed the source of a copy-file to be a symlink that points to a file that is produced by another pip P. Since the copy-file pip only has a single dependency, the dependency between the copy-file pip and pip P cannot be established. If the copy-file pip executes first, then it will fail because the target file has not been produced yet.
|
||||
|
||||
## Hashing and tracking
|
||||
### Hashing and tracking
|
||||
|
||||
### Input symlinks
|
||||
#### Input symlinks
|
||||
For normal files, BuildXL uses content hashes of those files to perform up-to-date checks. If the content hashes change, then the consuming and the producing pips need to execute. Because file symlinks can point to non-existent targets, BuildXL instead hashes the paths to the immediate target files, but not the final target files. If the targets of symlinks change, then the consuming pips need to execute.
|
||||
|
||||
With a similar reason, on tracking file symlinks, BuildXL tracks the symlinks themselves, and not the target files. Note that, since we require the chain of symlinks to be specified as dependencies, any change to one of the symlink in the chain can be detected by our file change tracker. Such a detection is important for our [Incremental Scheduling](./Incremental-Scheduling.md) feature. In the future, we may want to relax this condition by only requiring users to specify the symlink and its final target as dependencies.
|
||||
|
||||
### Output symlinks
|
||||
#### Output symlinks
|
||||
Output symlinks are hashed and tracked in the same way as input symlinks. That is, BuildXL hashes the path to the immediate target of the output symlink and tracks the output symlink itself.
|
||||
|
||||
Currently our cache's content addressable store does not support storing file symlinks. To replay output symlinks from cache, we include the information indicating whether a file is a symlink file as well as the path to the immediate target, if symlink, into the pip metadata. This metadata is stored in the cache, and, with this metadata, BuildXL can replay output symlinks by re-creating them using this new information.
|
||||
|
||||
### Copying symlinks
|
||||
#### Copying symlinks
|
||||
Before copying a symlink, BuildXL validates that the symlink does not point to a file that is produced during the build. BuildXL then discovers and track the chain of symlinks as well as the target file itself.
|
||||
|
||||
## Command line configuration
|
||||
### Command line configuration
|
||||
BuildXL enforces chains of symlinks, as described above, by default. One can pass the unsafe flag '/unsafe_IgnoreReparsePoints` to disable this enforcement. This flag is unsafe because BuildXL may not be able to detect any change on the middle symlink of a chain of symlinks. This issue can result in an underbuild.
|
||||
|
||||
# Directory symlinks
|
||||
## Directory symlinks
|
||||
|
||||
## Complications introduced by directory symlinks
|
||||
### Complications introduced by directory symlinks
|
||||
|
||||
Without directory symlinks, file system hierarchy is *tree*-shaped: every node (file or directory) except the root has exactly one parent node. In presence of directory symlinks, file hierarchy becomes a *directed graph* - and not necessarily acyclic either!
|
||||
|
||||
|
@ -69,7 +70,7 @@ If BuildXL were oblivious to this and tracked all those paths, that could lead t
|
|||
- **Redundant file materialization**: if BuildXL were to restore the `PluginManager.framework` directory layout from the example above, and in its "Path Set" it contained all 3 different paths to file `Versions/B/Resources/Info.plist` (e.g., because that file was accesses via all those paths), BuildXL could end up materializing the same file 3 different times.
|
||||
- **Order constraints during file materialization**: to materialize Resources/Info.plist, BuildXL would have to make sure that it first materialized both Resources and Current symlinks.
|
||||
|
||||
## Solution
|
||||
### Solution
|
||||
All the problems identified in the previous section can be mitigated by following this principle:
|
||||
|
||||
> BuildXL should never observe or track any paths that go through any intermediate directory symlinks.
|
||||
|
@ -84,13 +85,13 @@ In other words, instead of observing path lookups requested by the process at th
|
|||
```
|
||||
BuildXL captures each of the "readlink" and "stat" operations above and treats them as "read" dependencies.
|
||||
|
||||
## Prototypical use case
|
||||
### Prototypical use case
|
||||
|
||||
The most user-friendly way to use this feature is to have a *producer* pip that creates a directory layout with arbitrary symlinks inside, and declares the root of that directory as an opaque directory output. The consumer pips can then specify a single dependency on that opaque directory artifact which allows them to perform arbitrary path lookups within that directory. This use case is supported by BuildXL's ability to use dynamically observed file accesses as the real pip dependencies, so that the user doesn't have to explicitly specify them all.
|
||||
|
||||
In other cases, it is incumbent upon the user to specify file dependencies exactly how BuildXL will observe them, meaning declaring read access on all symlinks that may be resolved during path lookups as well as declaring permissions for the final file in terms of its physical path (one that doesn't contain any intermediate symlinks).
|
||||
|
||||
## Summary
|
||||
### Summary
|
||||
- Symlinks are treated as files (regardless of what they point to)
|
||||
- Paths observed and tracked by BuildXL never contain any intermediate directory symlinks
|
||||
- During pip process execution, every time a directory symlink is resolved, BuildXL detects that and captures the path to that directory symlink as a read dependency
|
||||
|
@ -104,21 +105,21 @@ In other cases, it is incumbent upon the user to specify file dependencies exact
|
|||
- BuildXL observes that the process read `Versions/Current`: that path is not specified by the user so BuildXL reports it as a read violation
|
||||
- BuildXL observes that the process read `Versions/B/Resources/Info.plist`: that path is not specified by the user so BuildXL reports it as a read violation.
|
||||
|
||||
# Junctions
|
||||
## Junctions
|
||||
In builds, particularly BuildXL builds, junctions are mostly used to avoid changing specification files and, in turn, make a previously built pip graph reusable. For example, one can create a junction from a NuGet package directory `NuGetCache\PackageX` to a directory `NuGetCache\PackageX-1.0` where all files of `PackageX` version `1.0` are located. All paths referring to `PackageX` in the spec files are in terms of the unversioned path `NuGetCache\PackageX`. If the user wants to test a new version of `PackageX-2.0`, then the user simply re-routes the junction from `NuGetCache\PackageX` to `NuGetCache\PackageX-2.0` without changing the spec files.
|
||||
|
||||
## Supported scenarios
|
||||
### Supported scenarios
|
||||
BuildXL has limited support for junctions. BuildXL currently only supports input file accesses via junctions. BuildXL does not support junction productions.
|
||||
For input file accesses via junctions, BuildXL handles junctions that cross volume boundaries, e.g., a junction from `X:\A\B` to `Y:\A\B`. However, BuildXL does not infer and track all incarnations of paths that are caused by junctions. For example, given a junction `D` to `D'`, where `D'` can also be a junction, if BuildXL is requested to track a file `D\f.txt`, then it just tracks `D\f.txt` and not `D'\f.txt`. Thus, any change to `D'`, like re-routing junction target if `D'` is a junction, will not be detected by BuildXL.
|
||||
|
||||
## Directory translations
|
||||
### Directory translations
|
||||
Accessing files via junctions can cause file access violations. Let's consider again the junction from `NuGetCache\PackageX` to a directory `NuGetCache\PackageX-1.0`. The tool that accesses files in that package executes may open a file by specifying a path containing `NuGetCache\PackageX-1.0` (e.g., the tool calls `GetFinalPathByHandle`). However, because the spec file do not contain paths containing `NuGetCache\PackageX-1.0`, BuildXL will report a file access violation.
|
||||
|
||||
To resolve this issue, BuildXL provides a directory translation feature. In the above case, the user specifies `/translateDirectory:NuGetCache\PackageX-1.0<NuGetCache\PackageX` in the command line. With this directory translation, whenever BuildXL sees a path containing `NuGetCache\PackageX-1.0`, like `NuGetCache\PackageX-1.0\f.txt`, it modifies it into `NuGetCache\PackageX\f.txt`. Now, since `NuGetCache\PackageX\f.txt` is specified in the spec file, BuildXL will no longer see file access violations.
|
||||
|
||||
One can create a chain of directory translations, and BuildXL will validate that the translations are acyclic. For example, suppose that `D` is a junction to `D'`, which in turn a junction to `D''`, and all paths in the spec file are in term of `D`. One can then have the directory translations `/translateDirectory:D''<D'` and `/translateDirectory:D'<D` to resolve file access violations. Given a path, if two directory translations are possible, BuildXL respect the order as these translations are specified in the command line argument.
|
||||
|
||||
## Junction tracking implementation
|
||||
### Junction tracking implementation
|
||||
Internally BuildXL maintains a map from `FileID` to `(Path, USN)` for tracking existing files and directories. In tracking a file `A\B\C\f.txt`, BuildXL tracks the file as well as its parent directories. That is, the tracker will have the following mappings:
|
||||
```
|
||||
FileID(A) -> (A, USN(A))
|
||||
|
|
Загрузка…
Ссылка в новой задаче