2019-08-30 14:12:47 +03:00
|
|
|
# backfill
|
2019-08-30 12:23:37 +03:00
|
|
|
|
2019-08-30 14:12:47 +03:00
|
|
|
A JavaScript caching library for reducing build time.
|
2019-08-30 12:23:37 +03:00
|
|
|
|
2019-09-24 16:07:10 +03:00
|
|
|
**🔌 Easy to install**: Simply wrap your build commands inside
|
2019-08-30 15:53:42 +03:00
|
|
|
`backfill -- [command]`
|
|
|
|
**☁️ Remote cache**: Store your cache on Azure Blob or as an npm package
|
|
|
|
**⚙️ Fully configurable**: Smart defaults with cross-package and per-package
|
|
|
|
configuration and environment variable overrides
|
2019-08-30 12:23:37 +03:00
|
|
|
|
2019-09-24 16:54:53 +03:00
|
|
|
_backfill is under active development and should probably not be used in
|
2020-04-03 15:38:31 +03:00
|
|
|
production, yet. We will initially focus on stability improvements. We will look
|
|
|
|
into various optimization strategies, adding more customization, and introducing
|
|
|
|
an API for only running scripts in packages that have changed and skipping
|
|
|
|
others altogether._
|
|
|
|
|
|
|
|
_Current prerequisites:_
|
|
|
|
|
|
|
|
- git (for running `--audit`)
|
|
|
|
- yarn.lock and yarn workspaces (for optimized hashing)
|
|
|
|
|
|
|
|
These prerequisites can easily be loosened to make backfill work with npm, Rush,
|
|
|
|
and Lerna.
|
2019-08-30 12:23:37 +03:00
|
|
|
|
2019-08-30 15:53:42 +03:00
|
|
|
## Why
|
2019-08-30 12:23:37 +03:00
|
|
|
|
2019-08-30 15:53:42 +03:00
|
|
|
When you're working in a multi-package repo you don't want to re-build packages
|
|
|
|
that haven't changed. By wrapping your build scripts inside `backfill` you
|
|
|
|
enable storing and fetching of build output to and from a local or remote cache.
|
2019-08-30 12:23:37 +03:00
|
|
|
|
2019-08-30 15:53:42 +03:00
|
|
|
Backfill is based on two concepts:
|
|
|
|
|
|
|
|
1. **Hashing**: It will hash the files of a package, its dependencies and the
|
|
|
|
build command
|
|
|
|
2. **Caching**: Using the hash key, it will look for build output from a local
|
|
|
|
or remote cache. If there's a match, it will backfill the package using the
|
|
|
|
cache. Otherwise, it will run the build command and persist the output to the
|
|
|
|
cache.
|
|
|
|
|
|
|
|
## Install
|
|
|
|
|
|
|
|
Install backfill using yarn:
|
|
|
|
|
|
|
|
```
|
|
|
|
$ yarn add --dev backfill
|
|
|
|
```
|
|
|
|
|
2020-04-22 14:58:52 +03:00
|
|
|
## Usage - CLI
|
2019-08-30 15:53:42 +03:00
|
|
|
|
|
|
|
```
|
|
|
|
backfill -- [command]
|
|
|
|
```
|
|
|
|
|
|
|
|
Typically you would wrap your npm scripts inside `backfill`, like this:
|
|
|
|
|
|
|
|
```json
|
|
|
|
{
|
|
|
|
"name": "package",
|
|
|
|
"scripts": {
|
2019-10-17 15:30:28 +03:00
|
|
|
"build": "backfill -- tsc -b"
|
2019-08-30 15:53:42 +03:00
|
|
|
}
|
|
|
|
}
|
|
|
|
```
|
|
|
|
|
|
|
|
### `--audit`
|
|
|
|
|
|
|
|
Backfill can only bring back build output from the folders it was asked to
|
2019-10-17 10:22:28 +03:00
|
|
|
cache. A package that modifies or adds files outside of the cached folder will
|
|
|
|
not be brought back to the same state as when it was initially built. To help
|
|
|
|
you debug this you can add `--audit` to your `backfill` command. It will listen
|
|
|
|
to all file changes in your repo (it assumes you're in a git repo) while running
|
|
|
|
the build command and then report on any files that got changed outside of the
|
|
|
|
cache folder.
|
2019-08-30 15:53:42 +03:00
|
|
|
|
|
|
|
### Configuration
|
|
|
|
|
2019-10-17 10:22:28 +03:00
|
|
|
Backfill will look for `backfill.config.js` in the package it was called from
|
|
|
|
and among parent folders recursively and then combine those configs together.
|
2019-08-30 15:53:42 +03:00
|
|
|
|
|
|
|
To configure backfill, simply export a config object with the properties you
|
|
|
|
wish to override:
|
|
|
|
|
|
|
|
```js
|
|
|
|
module.exports = {
|
|
|
|
cacheStorageConfig: {
|
|
|
|
provider: "azure-blob",
|
|
|
|
options: { ... }
|
|
|
|
}
|
|
|
|
};
|
|
|
|
```
|
|
|
|
|
|
|
|
The default configuration object is:
|
|
|
|
|
|
|
|
```js
|
|
|
|
{
|
|
|
|
cacheStorageConfig: { provider: "local" },
|
2019-10-22 16:04:20 +03:00
|
|
|
clearOutputFolder: false,
|
2019-09-18 20:41:59 +03:00
|
|
|
internalCacheFolder: "node_modules/.cache/backfill",
|
2019-09-18 14:15:30 +03:00
|
|
|
logFolder: "node_modules/.cache/backfill",
|
2019-10-22 16:04:20 +03:00
|
|
|
logLevel: "info",
|
2019-10-23 21:04:59 +03:00
|
|
|
mode: "READ_WRITE",
|
|
|
|
name: "[name-of-package]",
|
2020-03-24 14:29:50 +03:00
|
|
|
outputGlob: ["lib/**"],
|
2019-09-18 20:41:59 +03:00
|
|
|
packageRoot: "path/to/package",
|
2019-11-05 15:41:35 +03:00
|
|
|
producePerformanceLogs: false,
|
|
|
|
validateOutput: false
|
2019-08-30 15:53:42 +03:00
|
|
|
}
|
|
|
|
```
|
|
|
|
|
2020-03-24 14:29:50 +03:00
|
|
|
The `outputGlob` is a list of globs describing the files you want to cache.
|
|
|
|
`outputGlob` should be expressed as a relative path from the root of each
|
|
|
|
package. If you want to cache `package-a/lib`, for instance, you'd write
|
|
|
|
`outputGlob: ["lib/**"]`. If you also want to cache the `pacakge-a/dist/bundles`
|
|
|
|
folder, you'd write `outputGlob: ["lib/**", "dist/bundles/**"]`.
|
2019-10-22 15:10:56 +03:00
|
|
|
|
2019-08-30 15:53:42 +03:00
|
|
|
The configuration type is:
|
|
|
|
|
|
|
|
```ts
|
|
|
|
export type Config = {
|
|
|
|
cacheStorageConfig: CacheStorageConfig;
|
2019-10-22 16:04:20 +03:00
|
|
|
clearOutputFolder: boolean;
|
2019-09-18 20:41:59 +03:00
|
|
|
internalCacheFolder: string;
|
2019-09-18 14:15:30 +03:00
|
|
|
logFolder: string;
|
2019-10-22 16:04:20 +03:00
|
|
|
logLevel: LogLevels;
|
2019-10-23 21:04:59 +03:00
|
|
|
mode: "READ_ONLY" | "WRITE_ONLY" | "READ_WRITE" | "PASS";
|
2019-10-17 10:22:28 +03:00
|
|
|
name: string;
|
2020-03-24 14:29:50 +03:00
|
|
|
outputGlob: string[];
|
2019-10-17 10:22:28 +03:00
|
|
|
packageRoot: string;
|
2019-09-18 14:15:30 +03:00
|
|
|
performanceReportName?: string;
|
2019-10-22 16:04:20 +03:00
|
|
|
producePerformanceLogs: boolean;
|
2019-11-05 15:41:35 +03:00
|
|
|
validateOutput: boolean;
|
2019-08-30 15:53:42 +03:00
|
|
|
};
|
|
|
|
```
|
|
|
|
|
|
|
|
#### Environment variable
|
|
|
|
|
|
|
|
You can override configuration with environment variables. Backfill will also
|
|
|
|
look for a `.env`-file in the root of your repository, and load those into the
|
2019-10-17 10:22:28 +03:00
|
|
|
environment. This can be useful when you don't want to commit keys and secrets
|
|
|
|
to your remote cache, or if you want to commit a read-only cache access key in
|
|
|
|
the repo and override with a write and read access key in the PR build, for
|
|
|
|
instance.
|
2019-08-30 15:53:42 +03:00
|
|
|
|
2019-09-27 17:16:33 +03:00
|
|
|
See `getEnvConfig()` in
|
|
|
|
[`./packages/config/src/envConfig.ts`](https://github.com/microsoft/backfill/blob/master/packages/config/src/envConfig.ts#L15).
|
2019-08-30 15:53:42 +03:00
|
|
|
|
|
|
|
## Set up remote cache
|
|
|
|
|
|
|
|
### Microsoft Azure Blog Storage
|
|
|
|
|
2019-09-24 16:07:10 +03:00
|
|
|
To cache to a Microsoft Azure Blog Storage you need to provide a connection
|
2019-10-17 10:22:28 +03:00
|
|
|
string and the container name. If you are configuring via `backfill.config.js`,
|
2019-09-24 16:07:10 +03:00
|
|
|
you can use the following syntax:
|
|
|
|
|
|
|
|
```js
|
|
|
|
module.exports = {
|
|
|
|
cacheStorageConfig: {
|
|
|
|
provider: "azure-blob",
|
|
|
|
options: {
|
|
|
|
connectionString: "...",
|
|
|
|
container: "..."
|
2020-05-28 22:47:31 +03:00
|
|
|
maxSize: 12345
|
2019-09-24 16:07:10 +03:00
|
|
|
}
|
|
|
|
}
|
|
|
|
};
|
|
|
|
```
|
|
|
|
|
2020-05-28 22:47:31 +03:00
|
|
|
#### Options
|
|
|
|
|
|
|
|
<dl>
|
|
|
|
<dt>connectionString</dt>
|
|
|
|
<dd>retrieve this from the Azure Portal interface</dd>
|
|
|
|
|
|
|
|
<dt>container</dt>
|
|
|
|
<dd>the name of the blob storage container</dd>
|
|
|
|
|
|
|
|
<dt>maxSize (<em>optional</em>)</dt>
|
|
|
|
<dd>
|
|
|
|
max size of a single package cache, in the number of bytes
|
|
|
|
</dd>
|
|
|
|
</dl>
|
|
|
|
|
2019-09-24 16:07:10 +03:00
|
|
|
You can also configure Microsoft Azure Blog Storage using environment variables.
|
|
|
|
|
|
|
|
```
|
|
|
|
BACKFILL_CACHE_PROVIDER="azure-blob"
|
|
|
|
BACKFILL_CACHE_PROVIDER_OPTIONS='{"connectionString":"...","container":"..."}'
|
|
|
|
```
|
2019-08-30 15:53:42 +03:00
|
|
|
|
|
|
|
### Npm package
|
|
|
|
|
2019-09-24 16:07:10 +03:00
|
|
|
To cache to an NPM package you need to provide a package name and the registry
|
|
|
|
URL of your package feed. This feed should probably be private. If you are
|
2019-10-17 10:22:28 +03:00
|
|
|
configuring via `backfill.config.js`, you can use the following syntax:
|
2019-09-24 16:07:10 +03:00
|
|
|
|
|
|
|
```js
|
|
|
|
module.exports = {
|
|
|
|
cacheStorageConfig: {
|
|
|
|
provider: "npm",
|
|
|
|
options: {
|
|
|
|
npmPackageName: "...",
|
|
|
|
registryUrl: "..."
|
|
|
|
}
|
|
|
|
}
|
|
|
|
};
|
|
|
|
```
|
|
|
|
|
2019-10-17 10:22:28 +03:00
|
|
|
You can also provide a path to the `.npmrc` user config file, to provide auth
|
2019-09-24 16:07:10 +03:00
|
|
|
details related to your package feed using the `npmrcUserconfig` field in
|
|
|
|
`options`.
|
|
|
|
|
|
|
|
You can also configure NPM package cache using environment variables.
|
|
|
|
|
|
|
|
```
|
|
|
|
BACKFILL_CACHE_PROVIDER="npm"
|
|
|
|
BACKFILL_CACHE_PROVIDER_OPTIONS='{"npmPackageName":"...","registryUrl":"..."}'
|
|
|
|
```
|
2019-08-30 15:53:42 +03:00
|
|
|
|
2020-05-12 00:45:09 +03:00
|
|
|
### Skipping cache locally
|
|
|
|
|
|
|
|
Sometimes in a local build environment, it is useful to compare hashes to
|
|
|
|
determine whether to execute the task without having to explicitly use a
|
|
|
|
separate directory for the cache.
|
|
|
|
|
|
|
|
One caveat, this is using output that the task produced and one could possibly
|
|
|
|
modify the output on a local development environment. For this reason, this is
|
|
|
|
an opt-in behavior rather than the default.
|
|
|
|
|
|
|
|
The main benefit of using this strategy is a **significant** speed boost.
|
|
|
|
Backfill can skip file copying of the cached outputs if it can rely on the built
|
|
|
|
artifacts. Hashing is CPU-bound while caching is I/O-bound. Using this strategy
|
|
|
|
results in speed gains but at the cost of needing to trust the outputs have not
|
|
|
|
be altered by the user. While this usually is true, it is prudent to also
|
|
|
|
provide a command in your repository to clean the output along with the saved
|
|
|
|
hashes.
|
|
|
|
|
|
|
|
You can configure this from the `backfill.config.js` file this way:
|
|
|
|
|
|
|
|
```js
|
|
|
|
module.exports = {
|
|
|
|
cacheStorageConfig: {
|
|
|
|
provider: "local-skip"
|
|
|
|
}
|
|
|
|
};
|
|
|
|
```
|
|
|
|
|
|
|
|
Like other cases, you can also use the environment variable to choose this
|
|
|
|
storage strategy:
|
|
|
|
|
|
|
|
```
|
|
|
|
BACKFILL_CACHE_PROVIDER="local-skip"
|
|
|
|
```
|
|
|
|
|
2020-04-22 14:58:52 +03:00
|
|
|
## API
|
|
|
|
|
|
|
|
Backfill provides an API, this allows for more complex scenarios, and
|
|
|
|
performance optimizations.
|
|
|
|
|
|
|
|
```
|
|
|
|
const backfill = require("backfill/lib/api");
|
|
|
|
|
|
|
|
const packagePath = getPath(packageName);
|
|
|
|
|
2020-05-14 00:29:16 +03:00
|
|
|
const logger = backfill.makeLogger("verbose", process.stdout, process.stderr);
|
|
|
|
const packagehash = await backfill.computeHash(packagePath, logger);
|
2020-04-22 14:58:52 +03:00
|
|
|
|
|
|
|
const fetchSuccess = await backfill.fetch(packagePath, packageHash, logger);
|
|
|
|
|
|
|
|
if (!fetchSuccess) {
|
|
|
|
await runBuildCommand();
|
|
|
|
await backfill.put(packagePath, packageHash, logger);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
2019-09-18 14:15:30 +03:00
|
|
|
## Performance Logs
|
2019-08-30 15:53:42 +03:00
|
|
|
|
2019-09-18 14:15:30 +03:00
|
|
|
You can optionally output performance logs to disk. If turned on, backfill will
|
|
|
|
output a log file after each run with performance metrics. Each log file is
|
2019-11-05 15:41:35 +03:00
|
|
|
formatted as a JSON file. You can turn performance logging by setting
|
2019-10-17 12:29:05 +03:00
|
|
|
`producePerformanceLogs: true` in `backfill.config.js`.
|
2019-08-30 15:53:42 +03:00
|
|
|
|
|
|
|
## Contributing
|
2019-08-30 12:23:37 +03:00
|
|
|
|
2020-06-06 21:54:17 +03:00
|
|
|
### Ways to contribute
|
|
|
|
|
2019-09-24 16:07:10 +03:00
|
|
|
This project welcomes contributions and suggestions.
|
|
|
|
|
|
|
|
- [Submit bugs](https://github.com/microsoft/backfill/issues) and help us verify
|
|
|
|
fixes as they are checked in.
|
|
|
|
- Review the [source code changes](https://github.com/microsoft/backfill/pulls).
|
|
|
|
|
2020-06-06 21:54:17 +03:00
|
|
|
### Describing your changes
|
|
|
|
|
|
|
|
When submitting source code changes, be sure to accompany the changes with a
|
|
|
|
change file. Change files can be generated with the `yarn change` command.
|
|
|
|
|
|
|
|
### Contributor License Agreement (CLA)
|
|
|
|
|
2019-09-24 16:07:10 +03:00
|
|
|
Most contributions require you to agree to a Contributor License Agreement (CLA)
|
|
|
|
declaring that you have the right to, and actually do, grant us the rights to
|
|
|
|
use your contribution. For details, visit https://cla.opensource.microsoft.com.
|
2019-08-30 13:04:09 +03:00
|
|
|
|
|
|
|
When you submit a pull request, a CLA bot will automatically determine whether
|
|
|
|
you need to provide a CLA and decorate the PR appropriately (e.g., status check,
|
|
|
|
comment). Simply follow the instructions provided by the bot. You will only need
|
|
|
|
to do this once across all repos using our CLA.
|
|
|
|
|
|
|
|
This project has adopted the
|
|
|
|
[Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
|
|
|
|
For more information see the
|
|
|
|
[Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
|
|
|
|
contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any
|
|
|
|
additional questions or comments.
|