governance/wg-releases/retros/9.0.0-beta.1.md

3.6 KiB

Delayed Ship Date Retro - Electron v9.0.0-beta.1

February 5th, 2020

What Happened

Per our pre-published release schedule, the Electron team planned to release the first beta version of Electron 9.0.0 on February 5th, 2020. We triggered the first release attempt at 1:45 PM PST. This release failed at 2:44 PM PST. A fix for this failure was posted at 8:51 AM PST on February 6th, 2020.

Unfortunately this fix was not complete and a secondary fix was issued at 12:19 PM PST on February 6th, 2020. In testing the secondary fix, several other issues came to light. These issues were triaged and successfully resolved by 7:08 AM PST on February 7th.

Due to it being the weekend, the next release wasn't triggered until Monday, February 10th. 9.0.0-beta.1 was successfully released on February 10th, 2020 at 11:05 AM PST.

Incident Causes

The first failure was caused by a change in the location of the chromedriver release asset. This issue was fixed via a PR but that PR was not tested against a release build before merging. After merging a test release build was run which revealed that the first fix wasn't sufficient. A secondary PR was issued to resolve this issue.

When the secondary fix was tested against a release build, two new issues surfaced:

1. Our macOS builds were failing due to running out of disk space.  
2. Our Linux x64 build was producing a 1.2GB mksnapshot.zip file.    

Troubleshooting these issues took a long time due to how slow our macOS release jobs take.

The out of disk space issue on macOS was caused by our use of an older macOS SDK and upgrading to a newer macOS SDK resolved the issue. The large mksnapshot.zip issue was caused by an issue in how files get stripped on Linux. This issue was resolved by updating the code that strips files on Linux. Once these were resolved we were able to successfully release.

How This Could Have Been Prevented

In order to prevent this from happening we could have run test release builds on the 9-x-y branch before February 5th, 2020 to ensure that the release was ready to go. Additionally, the nightly release off of master before the 9-x-y branch was cut could have been checked to verify that it was successful. There is also the #bot-nightly-ci Slack channel that could have been checked before 9-x-y was cut to verify whether the nightly release builds were building and running properly.

Additionally, we could have saved some time by testing the initial fix against release builds before merging the PR.

Future Actions

In the future, there are several key actions we need to take to prevent this from occurring again.

Before cutting the release branch for the new beta release line from master, we will verify that the automatic nightly release builds on master have built correctly on CI and that the tests run in the #bot-nightly-ci` have passed.

Once the release branch has been cut, we will trigger test release builds against that branch at least 24 hours ahead of the beta release date.

PRs fixing release issues will be tested against release builds before merging those PRs so that we can verify the validity of the fix.

In order to better support debugging and troubleshooting of release builds in the future, we have already updated our CircleCI release process to run more quickly and to be more amenable to debug release issues.