angle/infra/ANGLEWrangling.md

9.7 KiB

ANGLE Wrangling

As an ANGLE Sheriff. Your job is to:

  1. Keep the ANGLE Standalone and ANGLE/Chromium waterfalls green.
  2. Ensure developers have reliable pre-commit testing via the ANGLE Standalone and ANGLE/Chromium try waterfalls.
  3. Control and monitor the ANGLE auto-rollers.
  4. Act as a point of contact for the Chromium Sheriff and other teams monitoring ANGLE regressions.
  5. Note: currently not working! Monitor and respond to ANGLE's Perf alerts.

If you're not an ANGLE team member, you can contact us on the public ANGLE project Google group.

Note: Please review and if needed update the wrangler schedule.

Note: If you need to suppress test failures (e.g. to keep an auto-roller unblocked), see Handling Test Failures.

Task: Monitor ANGLE CI and Try Testers

Your first job is to keep the ANGLE builders green and unblocked.

Post-commit CI builders

There are two consoles for ANGLE continuous integration builders:

We recommend you track ANGLE build failures is via Sheriff-o-matic. Bookmark the link and check it regularly during your shift. Note: currently flaky failures show up as separate failure instances.

We expect these waterfalls to be as "green" as possible. As a wrangler please help clean out any failures by finding and reverting problematic CLs, suppressing flaky tests that can't be fixed, or finding other solutions. We aim to have zero failing builds, so follow the campsite rule and leave the waterfall cleaner than when you started your shift.

When you encounter red builds or flakiness, file an ANGLE bug and set the label: Hotlist-Wrangler (search for existing bugs).

See more detailed instructions on ANGLE testing by following this link.

Pre-commit try builders

In addition to the CI builders, we have a console for try jobs on the ANGLE CV (change verifier):

Failures are intended on this waterfall as developers test WIP changes. You must act on any persistent flakiness or failure that causes developer drag by filing bugs, reverting CLs, or taking other action as with the CI waterfall.

If you find a failure that is unrelated to ANGLE, file a Chromium bug. Set the bug label Hotlist-PixelWrangler. Ensure you cc the current ANGLE and Chrome GPU wranglers, which you can find by consulting build.chromium.org. For more information see Filing Chromium Bug Reports below.

You can optionally follow Chromium bugs in the Internals>GPU>ANGLE component to be alerted to reports of ANGLE-related breakage in Chrome.

NOTE: When all builds seem to be purple or otherwise broken:

This could be a major infrastructure outage. File a high-priority bug using g.co/bugatrooper.

Filing Chromium Bug Reports

The GPU Pixel Wrangler is responsible for Chromium bugs. Please file Chromium issues with the Label Hotlist-PixelWrangler for bugs outside of the ANGLE project.

IMPORTANT info to include in Chromium bug reports:

  • Links to all first failing builds (e.g. first windows failure, first mac failure, etc).
  • Related regression ranges. See below on how to determine the ANGLE regression range.
  • Relevant error messages.
  • Set the Components to one or more value, such as (start typing "Internals" and you'll see choices):
    • Internals>GPU for general GPU bugs
    • Internals>GPU>Testing for failures that look infrastructure-related
    • Internals>GPU>ANGLE for ANGLE-related Chromium bugs
    • Internals>Skia for Skia-specific bugs
  • Cc relevant sheriffs or blame suspects, as well as yourself or the current ANGLE Wrangler.
  • Set the Hotlist-PixelWrangler Label.

How to determine the ANGLE regression range on Chromium bots:

  1. Open the first failing and last passing builds.
  2. For test failures: record parent_got_angle_revision in both builds.
  3. For compile failures record got_angle_revision.
  4. Create a regression link with this URL template: https://chromium.googlesource.com/angle/angle.git/+log/<last passing revision>..<first failing revision>

Task: The Auto-Rollers

The ANGLE into Chrome auto-roller automatically updates Chrome with the latest ANGLE changes.

The ANGLE into Android auto-roller updates Android with the latest ANGLE changes. You must manually approve and land these rolls.

We also use additional auto-rollers to roll third party libraries into ANGLE:

Roller health: You will be cc'ed on all rolls. Please check failed rolls to verify there is no blocking breakage.

For all rollers, you can trigger manual rolls using the dashboards to land high-priority changes. For example: Chromium-side test expectation updates or suppressions. When a roll fails, stop the roller, determine if the root cause is a problem with ANGLE or with the upstream repo, and file an issue with an appropriate next step.

The autoroller configurations live in the skia-autoroll-internal-config repository. Feel free to maintain these configs yourself, or file a Skia autoroll bug for help as needed.

Vulkan Dependencies auto-roller: Handling failures

Vulkan-deps consists of several related Vulkan dependencies: Vulkan-Tools, Vulkan-Loader, Vulkan-ValidationLayers, Vulkan-Headers and other related repos. One common source of breaks is a Vulkan Headers update, which can take a while to be integrated into other repos like the Vulkan Validation Layers. No action on your part is needed for header updates.

If a vulkan-deps AutoRoll CL triggers an failure in the presubmit bot, in the "export targets" step, you can:

  1. Add missing headers to the upstream BUILD.gn if possible. See this example CL.
  2. Otherwise, add headers to IGNORED_INCLUDES in export_targets.py. See this example CL.

If the roll fails for a reason other than a header update or presubmit, determine the correct upstream repo and file an issue upstream. For simple compilation failures, we usually submit fixes ourselves. For more info on vulkan-deps see the README.

ANGLE into Chrome auto-roller: SwANGLE builders

The ANGLE into Chromium roller has two SwiftShader + ANGLE (SwANGLE) builders: linux-swangle-try-x64 and win-swangle-try-x86. However, failures on these bots may be due to SwiftShader changes.

To handle failures on these bots:

  1. If possible, suppress the failing tests in ANGLE, opening a bug to investigate later.
  2. If you supsect an ANGLE CL caused a regression, consider whether reverting it or suppressing the failures is a better course of action.
  3. If you suspect a SwiftShader CL, and the breakage is too severe to suppress, (a lot of tests fail in multiple suites), consider reverting the responsible SwiftShader roll into Chromium and open a SwiftShader bug. SwiftShader rolls into Chromium should fail afterwards, but if the bad roll manages to reland, stop the autoroller as well.

Task: Monitor and respond to ANGLE's perf alerts

Any large regressions should be triaged with a new ANGLE bug linked to any suspected CLs that may have caused performance to regress. If it's a known/expected regression, the bug can be closed as such. The tests are very flaky right now, so a WontFix resolution is often appropriate.