Because:
- The CI job `Create FxA Image` ran out of memory
This Commit:
- Applies the the `--max-old-space-size=7168` node option, which should prevent the OOM.
Because:
- Not all CI operations where taking advantage of NX caches
This Commit:
- Adjusts nx.json config to ensure correct caching behavior
- Adjusts nx.json config to ensure proper target chains. ie build is dependent on pre-build
- Uses nx to directly run integration tests
- Uses nx to directly run unit tests
- Phases out dependency on generated .list files
- Ensures 'start up' for services for functional tests is done with nx and uses nx caches.
- Creates nx tags to facilitate test commands
- Fixes linter errors encountered
- Updates docker images to nx for builds
- Updates nx ignore files
Because:
* We want to use the latest typescript features.
This commit:
* Updates to TypeScript 5.2.2.
* Updates to Node 18.17.1 as OOM would occur frequently with 18.14.
* Updates CI to use a large instance with increased memory for Node.
Because:
* On stack startup, services start before mysql patches have completed,
causing profile-server to fail on startup.
This commit:
* Adds script to check if patcher script has started and is still
running.
Closes #
Because:
- We want to lock down our gql queries as part of good security practices.
This Commit:
- Creates task that extracts existing gql queries from our code
- Applies middleware that checks a allowlist of extracted queries.
- Creates a 403 if the query is not in the allowlist
- Makes all gql parsable by extraction utility. The extraction utility doesn't support string placeholders. e.g. gql` Account { ${ACCOUNT_FIELDS} }`
- Moves all the gql in the admin server into .ts files. Again for consistency and extraction utility support.
- Cleans up some config references at startup
- Adds nx config for extraction & copy tasks
Because:
- Deployed services all had a version set to v0.0.0
This Commit:
- Determines version specified in `packages/version.json`
- Provides this version to `build.sh` ensuring it is set in packages
Because:
* We want to introduce nx to the repository
This commit:
* Adds nx with working builds
Closes: FXA-7341
Co-authored-by: dschom <dschomburg@mozilla.com>
Because:
- We want to clean up our docker build
- We no longer used our service specific images
This Commit:
- Switches to multi stage docker build
- Removes unused building of individual services
- Removes used script that sets environment variables.
- Removes `workspaces focus` call that resulted in redundant installs
- Removes trap on build logs.
- Uses circleci DLC to get cache hits for fxa-utils and fxa-build-utils
- Removes scripts that build service specific images
- Cleans up naming conventions to reflect that only one image is being built
Because:
- We want the package.json version to match the git tag being deployed.
- We don't want to commit a change on the package.json file to accomplish this.
This Commit:
- If there is a git tag, the code will send it into the docker build as a VERSION build arg.
- This build arg is then used to update the version field in all package.json files.
Because:
- we want to use Glean for our metrics
This commit:
- integrates glean.js into content-server
- adds a couple placeholder registration Glean metrics events
Because:
- we don't want to maintain the fxa eslint plugin in another repo
This commit:
- move the plugin into the monorepo
Co-authored-by: Peter deHaan <peter@deseloper.com>
Because:
- we want to use the latest node LTS major version
This commit:
- upgrades FxA to use node 18, with two workarounds
- Webpack uses a hash algorithm that's no longer supported by default
in node 17+, causing build failures; --openssl-legacy-provider is
used as the workaround
- dns.lookup in node 17+ by default returns the results in the same
order as they are from the resolver, which could lead to
'localhost' resolving to ::1; --dns-result-order=ipv4first is used
as the workaround
Because:
- Nightly CI build was failing because fxa-shared now references the test folder during build.
This Commit:
- Removes tests directories from the docker ignore list.
Because:
- Something was off for base-install in jobs using the ci-base-browser-latest images. Yarn cache hits were not happening as anticipated. Also, in the event the yarn install could be skipped, the postinstall script was taking longer than expected.
- An occasional race condition where the deploy-fxa-ci-browser-image could be built off a stale base image was detected.
- Since ci-base-latest and ci-base-browsers-latests weren’t pushed at the same time, occasionally different builds would be used in a pipeline. Although this never directly led to problems, it did seem like the potential was there.
- The ci-* docker images were larger than desired..
- While investigating optimizations, it became apparent job spin up times could be further reduced by using CircleCI workspaces.
– Nightly builds are useful, and we would occasionally encounter a regression on main that wasn’t present in PRs due to differences in resulting state after a merge.
- SRE needs a better way to trigger smoke tests in parallel.
This commit:
- Reworks the docker file.
- Puts everything into one multi stage build, which results in better layer caching.
- Creates new tag names ci-builder, ci-test-runner, & ci-functional-test-runner
- Makes sure these images are as small as possible
- Updates references to these new images in the executors
- Ensures that the ci images are built on the same machine and pushed at the same time. This addresses the potential race condition described in the because section above.
- Short circuits the docker build for ci images if there are no npm package changes detected. Since the base images are really just a way to decrease build time by caching package dependencies in a docker image that compresses better than the circle ci cache, we don’t actually need to build the image unless a package change has occurred.
- Leverages CircleCI workspaces. The build now primes the workspace and it is then restored in the subsequent jobs. This ends up saving time, because we skip the build / install step, and it allows us to use much smaller docker images for any job running post build.
- Adds a script to extract the current state of the yarn cache from a docker image, so the yarn cache can be kept fresh. This was an oversight in the initial pass.
- For functional tests, the operation that starts the pm2 stack and the actual jobs have been separated into different steps. This gives us better timing metrics, and also lets us see which step fails most often. Starting the stack actually takes up a considerable amount of time, and shouldn’t be confused with the time it takes to execute tests.
- For functional tests (both playwright and content), a memory optimization was made by not including fxa-shared or fxa-react in the ‘run start’ operation. Both of these workspaces are now built in the ‘build’ stage, so there is no reason to include them when spinning up the stack. This actually resulted in a noticeable reduction in memory usage. This is probably due to the fact that these pm2 tasks were running a watch operation, which is more memory intensive than just running a build.
- Ensures that the entire git history isn't copied into the base image, which results in a smaller image. (Our git history is surprisingly large!) This is done by setting a depth of 1 when cloning and depth of 2 when fetching. These changes also address FXA-6676, because the clone and fetch operations have been modified.
- Adds a nightly workflow, so that we can run a full test suite to guard against regression resulting from a merge into main. See FXA-6626 for more information. We can also use this nightly flow to postpone any CI tasks that run on main, but aren’t urgent.
- Retains the ability to manually trigger tasks that were shifted to the nightly workflow. These tasks include deploying storybook or deploying packages.
- Adds the ability to trigger smoke test workflows with pipeline parameters. It’s possible this will simplify executing smoke tests in parallel for SRE.
- Fixes the issue of slow base-install script performance when the yarn install operation could be skipped. The issue was that when we invoke postinstall directly, fxa-shared would be built. This is already taken care of in the build step, so doing so in the base install was an unneeded redundancy.
- Adds a few other minor improvements fixes such as avoiding a couple redundant build / lint operations, making the build script a bit more robust, and cleaning up the config file a bit.
Because:
- We wanted to run a few preliminary checks before proceeding to more
expensive CI jobs. Checks include:
- Compiling typescript in commonly referenced workspace packages
- Linting code that has changed
- Executing Unit Tests for code that has changed
- We wanted to partition test operations into unit tests, and
integration tests. Unit tests can be run relatively quickly and
require no additional infrastructure. Integration tests require
additional infrastructure and generally have longer execution
times. Now that jobs are blocked from running until preliminary
checks pass, one of which is unit tests, it is important to draw a
distinction between these two types of tests.
- We want to avoid unnecessary yarn installs and typescript
compilations, which are time consuming.
- We want to make sure that test results are published and failing tests
can be easily viewed in the CI.
This Commit:
- Creates a build-and-validate job in the CI that builds, lints, and
unit tests code prior to running any other jobs.
- Creates unit-test job in CI config
- Creates integration-test job in CI config
- Removes redundant calls to compile workspace packages. These
are now built up front, cached, and restored as needed for future
runs.
- Extends the create-lists script functionality to generate commands
that can be executed with the parallel command.
- Removes unnecessary yarn install operations. Invoking yarn workspace
focus results in a yarn install. In the case of running tests this is largely
unnecessary, because we already do a yarn install in the base-install
step.
- Make sure test results are exported as junit xml so the CI can report
back on tests that were failing. This was done for a couple workspace
packages, but many were lacking the capability. All test:unit and
test:integration npm scripts now export this data.
- Fixes the following issues encountered along the way:
- Adds logs to monitor heap usage of jest tests. Some
jest tests are still using a lot of memory.
- Moves a few slow / long running tests from unit test to
integration tests.
- Ensures that jest.transform for ts-jest is always instructed
to have the config option isolateModules is set to true. This
definitely decreases memory overhead and resolves some
of the OOM errors we were hitting. It was configured in
some places but not everywhere.
- Exports test results files for all tests
- Exports all test artifacts
- Uses gnu parallel to run tests in parallel. Turns out yarn
workspaces foreach would give a false positive when an OOM
was encountered. Fortunately, the parallel command offered an
acceptable work around, and even offers some nice features
like the load argument, which allows to control test execution a
bit more efficiently.
Because:
- We want to speed up pipeline startup.
This Commit:
- Optimizes docker base image builds for the smallest images possible.
- Enables hard links for yarn cache
- Enables global yarn cache
- Avoids needlessly installing playwright browsers.
- Avoids needlessly running yarn install.
- Uses mozilla/fxa-circleci:ci-base-latest image for running test pipelines
- Uses mozilla/fxa-circleci:ci-base-browsers image for running functional tests pipelines
- Creates configurable executors that can be reused across pipelines
- Upgrades to yarn 3.3.0
- Enabled direct check out of PR code to test, which is faster than circle ci’s checkout command.
- Upgrades functional test to X-Large. This was already the case for playwright tests, but is now extended to content server tests too. This decision was made due to running lots of pipelines and realizing flakiness was largely due to CPU or memory hitting 100% for long periods of time.
- Turns off tracing, since it saves a bit of runtime.
Because:
- We are in the process of experimenting with CI pipeline optimizations
This Commit:
- Creates a new docker image that can be used in other CI jobs.
- This base image comes pre-populated with node_modules, dist folders, and l10n repos.
Because:
- When starting tracing with open telemetry the network name would sometimes collide and result in error.
This Commit:
- Sets an 'fxa' network name
- Applies this network to services
- Allows services to communicate by their names
Because:
- We want to start trace capture at client side
- We can't report trace data directly to cloud trace
This Commit:
- Adds support for client side tracing in tracing/browser-tracing.ts
- Some refactors to fxa-shared/tracing to ensure code is reused
- Cleans up tests
- Introduces traceparent header and tracestate html headers
- Opens up OTLP ports on jaeger
- Introduces a clientName to trace config
- Adds an open telemetry collector service (otel-collector)
- Uses this services for all trace capture
- Adds flow id to trace id headers so traces can be validated
Because:
- When not enabled Jaeger was exiting with an 'error' status which was confusing.
This commit:
- Adjust infrastructure.config.js so that Jaeger exists cleanly with a 'stopped' status.
- Reports a better log message so that it's clear Jaeger is not required to run FxA.
Because:
- We want to be able to see trace information
- We want to be able to export trace information to google cloud
This Commit:
- Adds utility class to fxa-shared for configuring and initializing open telemetry
- Initializes open telemetry in the auth server
- Forces resolution of google-gax.
- Forces resolution of @grpc/grpc-js.
- Adds the jaeger docker container for viewing traces locally. This is accessible on localhost:16686.
Note, adding @google-cloud/opentelemetry-cloud-trace-exporter resulted
in a version conflicts for google-gax and @grpc/grpc-js, which caused
runtime failures when interacting with @google-cloud/firestore. After
many experiments, this seems to do the trick.
On M1 we currently sometimes need to 'yarn restart services' after a 'yarn start' before things start working. Maybe if we wait for the containers to be responsive before starting services we can avoid this. The important one seems to be mysql.