9.2 KiB
Release Process
Environments
- Production - Run manually by ENGR team (unless we need Env Var or other infrastructure changes)
- Stage - Run automatically on PR merges
- Locals: Run by ENGRs on their own devices. (See README and other
docs/
.)
Development
Standard Monitor development follows a branching strategy similar to
GitHub Flow, where all branches stem directly from main
and
are merged back to main
:
- Create a branch from
main
- Make changes
- Create a pull request to
main
- Address review comments
- Merge the pull request
%%{init: { 'theme': 'base', 'gitGraph': {'rotateCommitLabel': true} } }%%
gitGraph
commit
branch change-1
commit
commit
checkout main
merge change-1
This means many features could be in development at the same time, and all can
merge back to main
when they are ready.
%%{init: { 'theme': 'base', 'gitGraph': {'rotateCommitLabel': true} } }%%
gitGraph
commit
branch change-1
commit
commit
checkout main
branch change-2
commit
checkout main
merge change-1
branch change-3
commit
commit
checkout main
merge change-2
checkout change-3
commit
commit
checkout main
Release Timeline
The standard release interval for Monitor is one week, meaning there should be at least one new version of the Monitor web app on the Production environment. However, since we've started doing 1-click deploy to production, our release cycle can become more and more frequent.
Preview Deployment
Every time a PR is opened, a docker image is created and deployed to the preview deployment environment powered by GCP Cloud Run / CloudSQL. A brand new database is created and schema is migrated specific to that PR. A brand new Cloud Run service is set up and cleaned up along with the database at the end of the lifecycle of that PR (when closed or merged). A preview URL is generated and linked in the PR when the environment is set up and changes are ready to be reviewed.
Release to Stage
Every commit to main
is automatically deployed to the Stage server via Github Actions and Jenkins.
PR Merges
PRs can only be merged once they pass all the required checks:
- Lint
- Build
- Unit Tests
- E2E Tests
- Deploy Previews
A PR also needs at least one approval from an ENGR team member to be merged into main
.
Once a PR is successfully merged:
- ensure that the merge commit in
main
branch passes all checks and a docker image is successfully deployed. - Jenkins will kick off the deployment of the latest built docker image to stage environment
- A webhook will send status messages into the
#fx-monitor-engineering
channel.- Watch for messages:
pushing to staging started
andsuccessfully deployed to staging
- Watch for messages:
Release to Production
Daily pre-releases
Github pre-releases are generated daily via a (daily-pre-release)daily-pre-release GHA workflow. The pre-release will include all the PRs that got merged into main (stage) that day. The cron job will only execute Monday to Friday. As a base load engineer, you can get into a habit of checking pre-releases every morning to see what's been pushed and what's been checked by the QAs. Once everything is checked off in a pre-release, we can proceed to deploying that pre-release to production and mark it as the latest
Deploy to Production
Before deploying to production, we need to assess the current state of our work on stage. We need to cross-reference what's already on stage and what's been greenlit by QA. To do this, we need to find the difference between what was released last time in production and what we currently have on stage.
Mark pre-release as latest, check the diff in Release Notes, and notify the team
- Find the pre-release/tag you want to use for the deploy (e.g.,
2024.09.01
) - Edit the release
- Check the checkbox
Set as the latest release
- Copy and Paste the release notes in the engineering slack channel so the team is aware
- Go through the PRs, cross-reference the tickets in the PRs with the Jira board to see if QA has approved the tickets. If anything is unclear, make sure to tag the author of the PR.
- If anything has not been properly tested, make a note, and again, double check with the person
- If everything looks good, proceed to release, otherwise refer to the section
Stage-fixes
below.
Update Production Environment Variables
In the cases where we need to update or add new environment variables, we need to get help from SRE:
- File an SRE ticket for the env var change.
- In the title, make sure to mention "Production"
- Make sure to include the value and the correct variable name
- Make sure to specify if it's a
secret
or a regular variable
- When appropriate, wait for SRE to make the changes before proceeding with the production release.
1-click Production Release
After you push the tag to GitHub, you should also make a release on GitHub for the tag.
- After all the checks above look good, click "Publish release"
- Go to the
main
branch and make sure all the checks succeeded - Go to DockerHub to ensure that a tag with today's date is present.
- Run E2E cron against stage (with the latest update)
- if there are errors, make sure the cause is understood
- fix the e2e errors or change the tests when appropriate before proceeding
- Check the stage Sentry and GCP error logs
- Run 1-Click Deploy Github Action
- Click on
Run workflow
Branch:main
is selectedprod
is selected for environment- Input the tag created earlier (today's date, e.g.,
2024.09.01
) - Click on
Run workflow
when ready
- Click on
- A webhook will send status messages into the
#fx-monitor-engineering
channel.- Watch for messages:
pushing to production started
andsuccessfully deployed to production
- Watch for messages:
- After successful deploy, conduct some basic sanity check:
- Check sentry prod project for a spike in any new issues
- Check grafana dashboard for any unexpected spike in ops
- Spot-check the site for basic functionality
Update Jira
On our Jira board, review the tickets listed under "Merged to main." If those were included in the release you just created, drag those tickets to either the "Promoted to Prod" or "Done" column. This will notify QA that they can verify the behavior on Prod if necessary.
If you're unsure whether a ticket was included in the release, ask the assigned person to move it if needed.
Stage-fixes
Ideally, every change can ride the regular weekly release "trains". But sometimes, not everything in main
can go out. Since we've adopted feature flags, these scenarios are becoming rarer. However, we still cannot guarantee that they never happen.
Wherever feature flags aren't applicable, there are generally two scenarios we need to consider:
- If the diff in changes is minimal (eg. can be traced back to a PR or two), the easiest way is to revert
- If the diff is not minimal, or a significant portion of the tickets haven't been QA'd:
- we can choose to delay the release (ask the team for consensus)
- we can create a separate release branch
Revert
- Revert the PR(s)
- Create a Github Release
- Revert the revert after production deployment is successful
- After the revert of revert is successfully merged into
main
, stage should be automatically put back to the state before Production release
- After the revert of revert is successfully merged into
Separate release branch
- Create a branch on top of
main
- Work on taking out the features that should not be included (not feature-flagged)
- Create a Github Release
- In the release, make sure to pick your branch (
main
is default) - Generate the release note, double check and make sure that it makes sense
- In the release, make sure to pick your branch (
- Proceed with the production release
Future
After adding 1-click production deploy capability and broadly adopting [feature flags][feature-flags], we are looking into ways to increase our production release frequency. The main challenge here is to coordiate our QA effort with our latest stage CICD deployments.