When mach errors out, an error report is sent to Sentry. This error
report contains information about the state of the interpreter during
the failure, details about the environment, installed packages and more.
Having this information available immediately when attempting to resolve
a bug report is generally desirable, instead of going through a back-and-forth
needinfo tag on Bugzilla or spending time asking the reporter questions on
Matrix.
This commit captures the Sentry ID returned from `sentry_sdk.capture_exception`
and prints it to the screen. If a user adds this line to their bug report (as
the error messages suggest) a build team member can enter this number into
Sentry to identify the exact report and debug the error. At minimum this will
reduce the amount of back-and-forth between the reporter and the assignee
required to resolve a bug. Optimally it should make bugs easier to spot and
reduce the time spent on end user support requests.
To use the Sentry ID to identify information about a specific bug report, the
bug assignee should open the Mozilla Sentry page for the `mach` project and
paste the ID into the search box, which will produce the full stack trace with
all submitted information.
Differential Revision: https://phabricator.services.mozilla.com/D100247
And don't set SHELL on mac workers (added in da452e43b5d5 because of the
exception thrown by this code not having a fallback)
Differential Revision: https://phabricator.services.mozilla.com/D97686
Here, "errors caused by local changes" means "errors whose stack traces contain a reference to a file that is in the set of files changed locally". This implementation is a trade-off:
1. This check will not catch issues caused transitively by changes to local files. For example, consider a function that has been updated and its return type changed in a backwards-incompatible way, whereas callers were not updated appropriately. This would likely manifest as a type error in the calling function after the callee has returned.
2. This check WILL catch issues that come from locally changed files where the cause of the error doesn't originate from those local changes. For example, consider a function that's been locally updated but is never called in the failing codepath; if an exception is thrown, it's not due to this local change, and we shouldn't filter it out.
There are conceivable improvements that we could apply to fix deficiency (1); for example, we could track imports recursively starting from the oldest frame in the stack trace and match on that set of imported files. Note this would not handle dynamic imports properly, and that this could exacerbate issue (2).
Issue (2) could conceivably be addressed by attempting to filter the actual local diffs down to changes that actually may be causing the error. This is difficult to do generally especially in light of Python's dynamism, but there mayb be conservative improvements that we could make in this space.
Overall, neither of the above caveats are deemed to be sufficiently concerning that this patch should be blocked as-is, and the current situation with our Sentry logs is unusable due to all the noise. This patch will probably have a substantial impact on that noise without incidentally filtering out too much signal.
Differential Revision: https://phabricator.services.mozilla.com/D95607
We add new metrics `distro` and `distro_version`. Their meaning varies based on the actual OS:
1. For Linux, the pair will be the name of the distribution and the distribution's version (e.g. `ubuntu`/`20.04`);
2. for macOS, the pair will be the string `macos` and the macOS version (e.g. `10.15.7`); and
3. for Windows, the pair will be the string (`windows`, `MAJOR.MINOR.BUILD`);
4. and for all other OS'es, the first will be the value of `sys.platform`, and the version string will be empty.
Differential Revision: https://phabricator.services.mozilla.com/D94781
Allow-list all Python code in tree for use with the black linter, and re-format all code in-tree accordingly.
To produce this patch I did all of the following:
1. Make changes to tools/lint/black.yml to remove include: stanza and update list of source extensions.
2. Run ./mach lint --linter black --fix
3. Make some ad-hoc manual updates to python/mozbuild/mozbuild/test/configure/test_configure.py -- it has some hard-coded line numbers that the reformat breaks.
4. Make some ad-hoc manual updates to `testing/marionette/client/setup.py`, `testing/marionette/harness/setup.py`, and `testing/firefox-ui/harness/setup.py`, which have hard-coded regexes that break after the reformat.
5. Add a set of exclusions to black.yml. These will be deleted in a follow-up bug (1672023).
# ignore-this-changeset
Differential Revision: https://phabricator.services.mozilla.com/D94045
Allow-list all Python code in tree for use with the black linter, and re-format all code in-tree accordingly.
To produce this patch I did all of the following:
1. Make changes to tools/lint/black.yml to remove include: stanza and update list of source extensions.
2. Run ./mach lint --linter black --fix
3. Make some ad-hoc manual updates to python/mozbuild/mozbuild/test/configure/test_configure.py -- it has some hard-coded line numbers that the reformat breaks.
4. Make some ad-hoc manual updates to `testing/marionette/client/setup.py`, `testing/marionette/harness/setup.py`, and `testing/firefox-ui/harness/setup.py`, which have hard-coded regexes that break after the reformat.
5. Add a set of exclusions to black.yml. These will be deleted in a follow-up bug (1672023).
# ignore-this-changeset
Differential Revision: https://phabricator.services.mozilla.com/D94045
Allow-list all Python code in tree for use with the black linter, and re-format all code in-tree accordingly.
To produce this patch I did all of the following:
1. Make changes to tools/lint/black.yml to remove include: stanza and update list of source extensions.
2. Run ./mach lint --linter black --fix
3. Make some ad-hoc manual updates to python/mozbuild/mozbuild/test/configure/test_configure.py -- it has some hard-coded line numbers that the reformat breaks.
4. Add a set of exclusions to black.yml. These will be deleted in a follow-up bug (1672023).
# ignore-this-changeset
Differential Revision: https://phabricator.services.mozilla.com/D94045
I accidentally broke the 'mach' unittests on Python 2 due to some difference in the unittest
module. Rather than poking into 'unittest', this patch moves us closer to the pytest format
while also fixing the issue.
Differential Revision: https://phabricator.services.mozilla.com/D93420
This document was imported from MDN and contained very outdated/incorrect information, and much of the information here is duplicated from the existing `mach` documentation. For the little content that isn't already expressed in the existing documentation in a better way, merge it into `python/mach/docs`.
The unique content is mainly in the FAQ, so I added a new page for that.
Differential Revision: https://phabricator.services.mozilla.com/D91455
Allows mach commands to define their own glean metrics with the `metrics_path` @CommandProvider parameter.
When `metrics_path` is defined:
* A `metrics` kwarg is provided to the decorated class. This `metrics` handle is a Glean instance, so Glean documentation should be consulted for usage information.
* When `mach doc telemetry` is run, metrics docs will be generated from all the registered metrics files.
Note: there was some consideration between making `metrics_path` a @CommandProvider or @Command parameter.
In the end, @CommandProvider seemed like a better fit because:
* Metrics seem to be more associated with the entire class than a specific command/method. This is because a class represents a "domain", and that domain may have different commands that have overlapping metrics. Accordingly, all the metrics should be defined once as available to the entire class.
* Currently, @Command methods only take parameters that map one-to-one with CLI arguments. It could seem inconsistent to have one exception: the metrics handle
Differential Revision: https://phabricator.services.mozilla.com/D85953
In the patch for bug 1656993, the case in which
get_command was being set was removed.
Accordingly, its usage in CommandAction will always be evaluated to
`False`, and it can be deleted.
Differential Revision: https://phabricator.services.mozilla.com/D90198
In addition to the existing build telemetry, also gather the stats and
report with Glean. This new telemetry is reported in tandem with the existing
telemetry to allow testing and confidence before a full roll-out.
Additionally, Glean isn't compatible with Python 2, so the new telemetry only runs
on Python 3 mach commands.
Differential Revision: https://phabricator.services.mozilla.com/D83572
This, hopefully, begins to address an ongoing global problem where we have few, if any, insights into the performance of individual build tasks (compilations, calls into Python scripts, etc.) At most we have aggregated statistics about how long tiers last, combined with `sccache` aggregates across the entire build (which don't cover non-compilation tasks). This has a few implications:
1. It's impossible to identify bottlenecks, except by going out of your way to notice and reproduce them. e.g. no one, to my knowledge, was aware that `make_dafsa.py` was a bottleneck until someone happened to notice and report it in bug 1629337. We could have systems that automatically detect this sort of thing, or at least that make it easier to do so than by CTRL-C'ing in the middle of the build several times to try to reproduce the problem.
2. It's impossible to detect regressions, unless the regression is so pronounced and severe that it has an immediate impact on the overall build time and triggers build time alerts.
3. It's impossible to identify that you have *fixed* regressions, except by doing ad-hoc timing measurements by building individual `make` targets. This is error-prone and annoying.
Here we propose a low-friction system wherein individual build tasks log their build own perf info. For now, that's a write to `stdout` consisting of the string `BUILDTASK ` followed by a simple JSON object with a start time, end time, the `argv` of the task, and an additional `"context"` key (I anticipate this could be used to annotate the task with relevant per-task for later aggregation, for example: was this an `sccache` cache hit or not? For now, it's empty everywhere). The build controller then collects this data, validates it, and writes out the entire list of build tasks as a JSON file after the build has completed, similarly to what we already do with `build_resources.json`. We already parse some `make` output to do stuff like tracking when we switch tiers, so this isn't a huge architectural shift or anything.
In my opinion this "should" happen at the build system, or `make`, level, but `make` doesn't expose anything resembling this information to my knowledge, so this has to be implemented outside of `make`. One could implement something like this at the `sccache` level but that doesn't touch anything but C/C++/Rust compilation tasks; an ideal solution would support other generic build tasks. We could also fork `make` to add this feature ourselves, but for several reasons I don't think that's tractable. :)
Of course, this approach has downsides:
1. We depend on parsing the `stdout` of `make`, and processes can unfortunately sometimes trample on each other, leading to data loss for individual build tasks occasionally. This is a necessary limitation of the model to my knowledge, and I don't know that it can be fixed generally. In my testing, not much data tends to be lost usually.
2. Dumping arbitrary data to `stdout` isn't always possible or desirable. If you're not careful about it this can also result in noisier-than-necessary tasks, especially when those tasks are not invoked by a parent process that knows how to handle the special `BUILDTASK` lines.
3. This data is raw enough where aggregation is not completely trivial.
4. This functionality has to be added for any new kind of build task whose performance we'd like to track; it doesn't come "for free" due to not being able to be implemented at the build system level.
5. The data isn't awfully small due to the `argv`'s (at this point, not nearly big enough where we need to be concerned about it IMO, but maybe that will change in the future?)
One can imagine a couple other architectures that could avoid the first two problems, namely: 1) we could use a "real" database that would not dump info to `stdout` and wouldn't lose data, like `sqlite3`; or, 2) we could set up another server, similar to `sccache`, that collects this data from subprocesses and aggregates it, making sure not to lose any along the way. Both of these have enough overhead, in terms of engineering effort or actual impact on latency, where I dont know that they make any sense to even attempt implementing. The remaining continue to be real issues, however.
After this is landed there are a few ways forward. We can start uploading these files as build artifacts in CI to allow us to reason about performance impacts of changes in `central`. We can easily add this functionality to the `sccache` client to start tracking those builds as well. We already have a very simple visualization of build tier timing in `mach resource-usage`; we could join that data against the `BUILDTASK` data to produce a very clear visualization of build bottlenecks, i.e., "why is the `export` tier taking so long", etc.
Differential Revision: https://phabricator.services.mozilla.com/D80284
In two different places we've been encountering issues regarding 1) how we configure the system Python environment and 2) how the system Python environment relates to the `virtualenv`s that we use for building, testing, and other dev tasks. Specifically:
1. With the push to use `glean` for telemetry in `mach`, we are requiring (or rather, strongly encouraging) the `glean_sdk` Python package to be installed with bug 1651424. `mach bootstrap` upgrades the library using your system Python 3 in bug 1654607. We can't vendor it due to the package containing native code. Since we generally vendor all code required for `mach` to function, requiring that the system Python be configured with a certain version of `glean` is an unfortunate change.
2. The build uses the vendored `glean_parser` for a number of build tasks. Since the vendored `glean_parser` conflicts with the globally-installed `glean_sdk` package, we had to add special ad-hoc handling to allow us to circumvent this conflict in bug 1655781.
3. We begin to rely more and more on the `zstandard` package during build tasks, this package again being one that we can't vendor due to containing native code. Bug 1654994 contained more ad-hoc code which subprocesses out from the build system's `virtualenv` to the SYSTEM `python3` binary, assuming that the system `python3` has `zstandard` installed.
As we rely more on `glean_sdk`, `zstandard`, and other packages that are not vendorable, we need to settle on a standard model for how `mach`, the build process, and other `mach` commands that may make their own `virtualenv`s work in the presence of unvendorable packages.
With that in mind, this patch does all the following:
1. Separate out the `mach` `virtualenv_packages` from the in-build `virtualenv_packages`. Refactor the common stuff into `common_virtualenv_packages.txt`. Add functionality to the `virtualenv_packages` manifest parsing to allow the build `virtualenv` to "inherit" from the parent by pointing to the parent's `site-packages`. The `in-virtualenv` feature from bug 1655781 is no longer necessary, so delete it.
2. Add code to `bootstrap`, as well as a new `mach` command `create-mach-environment` to create `virtualenv`s in `~/.mozbuild`.
3. Add code to `mach` to dispatch either to the in-`~/.mozbuild` `virtualenv`s (or to the system Python 3 for commands which cannot run in the `virtualenv`s, namely `bootstrap` and `create-mach-environment`).
4. Remove the "add global argument" feature from `mach`. It isn't used and conflicts with (3).
5. Remove the `--print-command` feature from `mach` which is obsoleted by these changes.
This has the effect of allowing us to install packages that cannot be vendored into a "common" place (namely the global `~/.mozbuild` `virtualenv`s) and use those from the build without requiring us to hit the network. Miscellaneous implementation notes:
1. We allow users to force running `mach` with the system Python if they like. For now it doesn't make any sense to require 100% of people to create these `virtualenv`s when they're allowed to continue on with the old behavior if they like. We also skip this in CI.
2. We needed to duplicate the global-argument logic into the `mach` script to allow for the dispatch behavior. This is something we avoided with the Python 2 -> Python 3 migration with the `--print-command` feature, justifying its use by saying it was only temporarily required until all `mach` commands were running with Python 3. With this change, we'll need to be able to determine the `mach` command from the shell script for the forseeable future, and committing to this forever with the cost that `--print-command` incurs (namely `mach` startup time, an additional .4s on my machine) didn't seem worth it to me. It's not a ton of duplicated code.
Differential Revision: https://phabricator.services.mozilla.com/D85916
Now you can pass the `virtualenv_name` kwarg to the `Command` decorator which will configure the `_virtualenv_manager` accordingly.
Differential Revision: https://phabricator.services.mozilla.com/D86256
Now you can pass the `virtualenv_name` kwarg to the `Command` decorator which will configure the `_virtualenv_manager` accordingly.
Differential Revision: https://phabricator.services.mozilla.com/D86256
Today we don't require that `mach` `CommandProvider`s subclass from any particular parent class and we're very lax about the requirements they must meet. While that's convenient in certain circumstances, it has some unfortunate implications for feature development.
Today the only requirements that we have for `CommandProvider`s are that they have an `__init__()` method that takes either 1 or 2 arguments, the second of which must be called `context` and is populated with the `mach` `CommandContext`. Again, while this flexibility is occasionally convenient, it is limiting. As we add features to `mach`, having a better idea what the shape of our `CommandProvider`s are and how we can instantiate them and use them is increasingly important, and this gives us additional control when having `mach` configure `CommandProvider`s based on data that is only available at the `mach` level. In particular, we plan to leverage this in bugs 985141 and 1654074.
Here we add validation to the `CommandProvider` decorator to ensure all classes inherit from `MachCommandBase`, update all `CommandProvider`s in-tree to inherit from `MachCommandBase`, and update source and test code accordingly.
Follow-up work: we now require (de facto) that the `context` be populated with a `topdir` attribute by the `populate_context_handler` function, since instantiating the `MachCommandBase` requires a `topdir` be provided. This is fine for now in the interest of keeping this patch reasonably sized, but some additional refactoring could make this cleaner.
Differential Revision: https://phabricator.services.mozilla.com/D86255
These tests depend on the `mach uuid` command which was deleted with bug 1639509, and now that `mach uuid` is gone it's broken unconditionally. We could replace the reference to `uuid` with a new no-op `mach` command, but we're in the process of replacing our telemetry code with use of the `glean` API; and the new telemetry code won't have the same semantics (namely, we are unlikely to want to continue to guarantee that sub-`mach` invocations aren't covered by telemetry), so this test might as well just be deleted now.
Differential Revision: https://phabricator.services.mozilla.com/D85911
@CommandProvider does parameter validation and collects information (such
as "pass_context") that will be needed by Registrar.
However, rather than just checking parameter length, we can make it more
specific and assert that the specific expected parameter ("context") exists.
Differential Revision: https://phabricator.services.mozilla.com/D85482
Sentry is initialized globally, but it's not clear to consumers when this actually happens.
For example, an unwary developer may call report_exception() within check_and_get_mach(),
not knowing that Sentry hasn't been initialized yet.
Using a class should make the dependency on register_sentry() more verbose.
Depends on D80913
Differential Revision: https://phabricator.services.mozilla.com/D80918
Sentry needs to be able to send data in a minimal environment, and intelligently determining
the "topobjdir" (without leaning on Mach) is tough.
Instead, since the "topobjdir" usually falls under the "topsrcdir", just leaning on <topsrcdir>
patching should be sufficient.
Differential Revision: https://phabricator.services.mozilla.com/D80122
To avoid sending identifying information, common absolute paths are patched with placeholder values. For example, devs
may place their Firefox repository within their home dir, so absolute paths are doctored to be prefixed with
"<topsrcdir"> instead.
Additionally, any paths including the user's home directory are patched to instead be a relate path from "~".
Differential Revision: https://phabricator.services.mozilla.com/D78962
To avoid sending identifying information, common absolute paths are patched with placeholder values. For example, devs
may place their Firefox repository within their home dir, so absolute paths are doctored to be prefixed with
"<topsrcdir"> instead.
Additionally, any paths including the user's home directory are patched to instead be a relate path from "~".
Differential Revision: https://phabricator.services.mozilla.com/D78962
It is possible to have both a default command (with positional arguments) and
sub-commands (with arguments) in mach. If the subcommand exists, it
is dispatched to; if it doesn't the default command is called the positional
argument filled in.
However, when you run ./mach command --help, it will detect the subcommands
and only print out their help section. If the default command has arguments,
they were not printed out. Now they are.
Small papercuts in this patch are that the Default Command Arguments are
printed after the subcommands and that subcommand help without default
arguments have an extra newline after them. Both of these seem small
enough that the refactoring necessary to abate them is undesirable.
Differential Revision: https://phabricator.services.mozilla.com/D76505