FluidFramework/scripts
Alex Villarreal c126bba051
refactor(ci): Upload stage telemetry separately for each stage (#22616)
## Description

This refactors our setup to upload telemetry for stages during ADO
pipeline runs.

One of the main problems the current setup has is that it waits for all
"target" stages (the ones whose telemetry we want to upload) in a given
pipeline to complete before the stage that uploads telemetry runs. In
the E2E tests pipeline, where the stage that runs tests against ODSP
usually takes ~2hrs to run but can also wait many hours to _start_
(because we have exclusive locks so only one pipeline run can execute
tests against a given external service at the same time, and thus the
stage has to wait for the corresponding stage in previous pipeline runs
to finish), the telemetry for other stages can severely lag the actual
time when things happened. This can cause confusion when our OCEs get
IcM incidents, because the thing that caused the incident to fire
happened many hours ago (sometimes the previous day).

The refactor in this PR makes it so instead of having a single stage at
the end of a pipeline run which uploads the telemetry for all other
relevant stages in that run, we now have one for each of the relevant
"target" stages. The new stage depends only on the "target" stage, so it
runs immediately after it. This does have the disadvantage that we now
have many more stages in a pipeline run, each one needs to be scheduled
on an available build agent, and they all run similar steps, like
checking out the repository. So the total usage time for build agents
will probably go up a bit. The monetary cost should not be significant,
though, so I think this is fine. All of this applies to the test
pipelines; I kept the existing setup for Build - client because in that
one we don't really care about tracking each stage separately, we really
only care about the pipeline as a whole.

The refactor also entailed some cleanup and improvements on some JS/TS
code related to stage telemetry. The scripts that get the run
information for a stage now can take a specific `STAGE_ID` from the
environment instead of getting the list of stage ids themselves. They're
also more aggressive with validation of the inputs they expect.
2024-09-26 10:38:46 -05:00
..
formatting tools(client+build-tools): update Biome to 1.8.3 (#21869) 2024-07-12 21:26:58 +00:00
check-package-version.cjs ci: Fix version comparison in release step (#14701) 2023-03-22 11:18:37 -07:00
get-test-pass-rate.mjs refactor(ci): Upload stage telemetry separately for each stage (#22616) 2024-09-26 10:38:46 -05:00
only-pnpm.cjs ci: Use corepack to control package manager versions (#14935) 2023-04-03 13:47:59 -07:00
pack-packages.sh ci: Use outFile flag instead of stdout redirection in flub list (#22319) 2024-08-27 22:00:02 +00:00
report-parser.js Enforce test output (#17623) 2023-10-10 15:00:20 +00:00
update-package-version.sh build: Replace deprecated flag in pipeline (#16205) 2023-07-11 15:52:52 -07:00