Document all the Python things (#491)

* Document all the Python things * Address low-hanging fruit comments in the PR * Update test fixture instructions * Clarify events ping * Typo * Fix typo * Add link to other docs. * Add note about Python data directory * Update docs/user/testing-metrics.md Co-Authored-By: Raphael Pierzina <raphael@hackebrot.de>
2019-11-20 13:11:31 -05:00 · 2019-11-20 13:11:31 -05:00 · 8b511a346a
--- a/docs/dev/core/internal/directory-structure.md
+++ b/docs/dev/core/internal/directory-structure.md
@ -8,6 +8,8 @@ On Android, this directory lives inside the [`ApplicationInfo.dataDir`](https://

 On iOS, this directory lives inside the [`Documents`](https://developer.apple.com/library/archive/documentation/FileManagement/Conceptual/FileSystemProgrammingGuide/FileSystemOverview/FileSystemOverview.html) directory associated with the application.

+For the Python bindings, if no directory is specified, it is stored in a temporary directory and cleared at exit.
+
 Within the `glean_data` directory are the following contents:

 - `db`: Contains the [rkv](https://github.com/mozilla/rkv) database used to persist ping and user lifetime metrics.
--- a/docs/dev/python/setting-up-python-build-environment.md
+++ b/docs/dev/python/setting-up-python-build-environment.md
@ -2,7 +2,7 @@

 This document describes how to set up an environment for the development of the Glean Python bindings.

-Instructions for installing a copy of the Glean Python bindings into your own environment for use in your project are TBD.
+Instructions for installing a copy of the Glean Python bindings into your own environment for use in your project are described in [adding Glean to your project](../../user/adding-glean-to-your-project.html).

 ## Prerequisites

--- a/docs/user/adding-glean-to-your-project.md
+++ b/docs/user/adding-glean-to-your-project.md
@ -111,18 +111,73 @@ github "mozilla/glean" "master"

 </div>

+<div data-lang="Python" class="tab">
+
+It is recommended that you use a virtual environment for your work to isolate the dependencies for your project. There are many popular abstractions on top of virtual environments in the Python ecosystem which can help manage your project dependencies.
+
+The Python Glean bindings currently have [prebuilt wheels on PyPI for x86_64 Linux only](https://pypi.org/project/glean-sdk/#files).
+
+If you're running that platform and have your virtual environment set up and activated, you can install Glean into it using:
+
+```bash
+$ python -m pip install glean_sdk
+```
+
+If you are not on x86_64 Linux, you will need to build the Glean Python bindings from source using [these instructions](../dev/python/setting-up-python-build-environment.html).
+
+</div>
+
 {{#include ../tab_footer.md}}

 ### Adding new metrics

 All metrics that your project collects must be defined in a `metrics.yaml` file.
-Add this file to your project and define it as an input file for the `sdk_generator.sh` script in the `Run Script` step defined before.
+
 The format of that file is documented [with `glean_parser`](https://mozilla.github.io/glean_parser/metrics-yaml.html).
 To learn more, see [adding new metrics](adding-new-metrics.md).

 > **Important**: as stated [before](adding-glean-to-your-project.md#before-using-glean), any new data collection requires documentation and data-review.
 > This is also required for any new metric automatically collected by the Glean SDK.

+{{#include ../tab_header.md}}
+
+<div data-lang="Swift" class="tab">
+
+On iOS, add this file to your project and define it as an input file for the `sdk_generator.sh` script in the `Run Script` step defined before.
+
+</div>
+
+<div data-lang="Python" class="tab">
+
+For Python, the `metrics.yaml` file must be available and loaded at runtime.
+
+If your project is a script (i.e. just Python files in a directory), you can load the `metrics.yaml` using:
+
+```Python
+from glean import load_metrics
+
+metrics = load_metrics("metrics.yaml")
+
+# Use a metric on the returned object
+metrics.your_category.your_metric.set("value")
+```
+
+If your project is a distributable Python package, you need to include the `metrics.yaml` file using [one of the myriad ways to include data in a Python package](https://setuptools.readthedocs.io/en/latest/setuptools.html#including-data-files) and then use [`package_resources.resource_filename()`](https://setuptools.readthedocs.io/en/latest/pkg_resources.html#resource-extraction) to get the filename at runtime.
+
+```Python
+from glean import load_metrics
+from package_resources import resource_filename
+
+metrics = load_metrics(resource_filename(__name__, "metrics.yaml"))
+
+# Use a metric on the returned object
+metrics.your_category.your_metric.set("value")
+```
+
+</div>
+
+{{#include ../tab_footer.md}}
+
 ### Adding custom pings

 Please refer to the [custom pings documentation](pings/custom.md).
@ -144,13 +199,16 @@ These specific steps are described in [the `probe_scraper` documentation](https:

 The following steps are required for applications using the Glean SDK, but not libraries.

+> **Important:** The Glean SDK should only be initialized from the main application, not individual libraries.
+
+If you are adding Glean support to a library, you can safely skip this section.
+
 {{#include ../tab_header.md}}

 <div data-lang="Kotlin" class="tab">

 ### Initializing the Glean SDK

-The Glean SDK should only be initialized from the main application, not individual libraries.  If you are adding Glean support to a library, you can safely skip this section.
 Please also note that the Glean SDK does not support use across multiple processes, and must only be initialized on the application's main process. Initializing in other processes is a no-op.
 Additionally, Glean must be initialized on the main (UI) thread of the applications main process. Failure to do so will throw an `IllegalThreadStateException`.

@ -198,7 +256,7 @@ This method should also be called at least once prior to calling `Glean.initiali

 The application should provide some form of user interface to call this method.

-When going from enabled to disabled, all pending events, metrics and pings are cleared, except for `first_run_date`.
+When going from enabled to disabled, all pending events, metrics and pings are cleared, except for [`first_run_date`](pings/index.html#the-client_info-section).
 When re-enabling, core Glean metrics will be recomputed at that time.

 </div>
@ -207,8 +265,6 @@ When re-enabling, core Glean metrics will be recomputed at that time.

 ### Initializing the Glean SDK

-The Glean SDK should only be initialized from the main application, not individual libraries.
-If you are adding Glean support to a library, you can safely skip this section.
 Please also note that the Glean SDK does not support use across multiple processes, and must only be initialized on the application's main process.

 An excellent place to initialize Glean is within the `application(_:)` method of the class that extends the `UIApplicationDelegate` class.
@ -255,23 +311,49 @@ This method should also be called at least once prior to calling `Glean.shared.i

 The application should provide some form of user interface to call this method.

-When going from enabled to disabled, all pending events, metrics and pings are cleared, except for `first_run_date`.
+When going from enabled to disabled, all pending events, metrics and pings are cleared, except for [`first_run_date`](pings/index.html#the-client_info-section).
 When re-enabling, core Glean metrics will be recomputed at that time.

 </div>

 <div data-lang="Python" class="tab">

-> **Note**: This content is a placeholder.  The Python bindings are under development.
-
 ### Initializing the Glean SDK

+The main control for Glean is on the `glean.Glean` singleton.
+
+The Glean SDK should be initialized as soon as possible, and importantly, before any other libraries in the application start using Glean.
+Library code should never call `Glean.initialize`, since it should be called exactly once per application.
+
+
 ```python
-import glean
-cfg = glean.Configuration()
-glean.Glean.initialize(cfg, "/path/to/datadir")
+from glean import Glean
+
+# Call Glean.set_upload_enabled first, since Glean.initialize might send pings
+# if there are any metrics queued up from a previous run.
+Glean.set_upload_enabled(True)
+
+Glean.initialize(
+    "my-app-id",  # The id of your application
+    "0.1.0",  # The version of your application
+)
 ```

+Additonal configuration is available on the `glean.Configuration` object, which can be passed into `Glean.initialize()`.
+
+Unlike Android and Swift, the Python bindings do not automatically send any pings. 
+See the [custom pings documentation](pings/custom.md) about adding custom pings and sending them.
+
+### Enabling and disabling metrics
+
+`Glean.set_upload_enabled()` should be called in response to the user enabling or disabling telemetry.
+This method should also be called at least once prior to calling `Glean.initialize()`.
+
+The application should provide some form of user interface to call this method.
+
+When going from enabled to disabled, all pending events, metrics and pings are cleared, except for [`first_run_date`](pings/index.html#the-client_info-section).
+When re-enabling, core Glean metrics will be recomputed at that time.
+
 </div>

 {{#include ../tab_footer.md}}
--- a/docs/user/adding-new-metrics.md
+++ b/docs/user/adding-new-metrics.md
@ -139,4 +139,10 @@ GleanMetrics.Views.loginOpened...

 </div>

+<div data-lang="Python" class="tab">
+
+Category and metric names in the `metrics.yaml` are in `snake_case`, which matches the [PEP8](https://www.python.org/dev/peps/pep-0008/) standard, so no translation is needed for Python.
+
+</div>
+
 {{#include ../tab_footer.md}}
--- a/docs/user/debugging.md
+++ b/docs/user/debugging.md
@ -1,4 +1,4 @@
-# Debugging products using the Glean SDK
+# Debugging products using the Glean SDK for Android

 The Glean SDK exports the `GleanDebugActivity` that can be used to toggle debugging features on or off.
 Users can invoke this special activity, at run-time, using the following [`adb`](https://developer.android.com/studio/command-line/adb) command:
--- a/docs/user/experiments-api.md
+++ b/docs/user/experiments-api.md
@ -31,8 +31,6 @@ Please also note that the `extra` map is non-nested arbitrary `String` to `Strin
 There are test APIs available too:

 ```Kotlin
-import org.mozilla.yourApplication.GleanMetrics.SearchDefault
-
 // Was the experiment annotated in Glean pings?
 assertTrue(Glean.testIsExperimentActive("blue-button-effective"))
 // Was the correct branch reported?
@ -49,6 +47,44 @@ assertEquals(

 </div>

+<div data-lang="Python" class="tab">
+
+```Python
+from glean import Glean
+
+# Annotate Glean pings with experiments data.
+Glean.set_experiment_active(
+  experiment_id="blue-button-effective",
+  branch="branch-with-blue-button",
+  extra={
+    "buttonLabel": "test"
+  }
+)
+
+# After the experiment terminates, the annotation
+# can be removed.
+Glean.set_experiment_inactive("blue-button-effective")
+```
+
+> **Important**: Experiment IDs and branch don't need to be pre-defined in the Glean SDK registry files.
+Please also note that the `extra` dict is non-nested arbitrary `str` to `str` mapping.
+
+There are test APIs available too:
+
+```Python
+from glean import Glean
+
+# Was the experiment annotated in Glean pings?
+assert Glean.test_is_experiment_active("blue-button-effective")
+# Was the correct branch reported?
+assert (
+    "branch-with-blue-button" ==
+    Glean.test_get_experiment_data("blue-button-effective").branch
+)
+```
+
+</div>
+
 {{#include ../tab_footer.md}}

 ## Limits
@ -61,3 +97,4 @@ assertEquals(
 ## Reference

 * [Kotlin API docs](../../javadoc/glean/mozilla.telemetry.glean/-glean.html).
+* [Python API docs](../../python/glean/glean.html)
--- a/docs/user/metrics/labeled_counters.md
+++ b/docs/user/metrics/labeled_counters.md
@ -107,7 +107,7 @@ assert 1 == metrics.stability.crash_count.test_get_num_recorded_errors(

 * Labels are limited to starting with either a letter or an underscore character.

-* Each label must have a maximum of 60 characters.
+* Each label must have a maximum of 60 bytes, when encoded as UTF-8.

 * If the labels are specified in the `metrics.yaml`, using a different label will be replaced with the special value `__other__`.

--- a/docs/user/metrics/string.md
+++ b/docs/user/metrics/string.md
@ -114,21 +114,21 @@ XCTAssertEqual(1, SearchDefault.name.testGetNumRecordedErrors(.invalidValue))
 from glean import load_metrics
 metrics = load_metrics("metrics.yaml")

-// Record a value into the metric.
+# Record a value into the metric.
 metrics.search_default.name.set("duck duck go")
-// If it changed later, you can record the new value:
+# If it changed later, you can record the new value:
 metrics.search_default.name.set("wikipedia")
 ```

 There are test APIs available too:

-```Kotlin
-// Was anything recorded?
+```Python
+# Was anything recorded?
 assert metrics.search_default.name.test_has_value()
-// Does the string metric have the expected value?
-// IMPORTANT: It may have been truncated -- see "Limits" below
+# Does the string metric have the expected value?
+# IMPORTANT: It may have been truncated -- see "Limits" below
 assert "wikipedia" == metrics.search_default.name.test_get_value()
-// Was the string truncated, and an error reported?
+# Was the string truncated, and an error reported?
 assert 1 == metrics.search_default.name.test_get_num_recorded_errors(
    ErrorType.INVALID_VALUE
 )
@ -140,7 +140,7 @@ assert 1 == metrics.search_default.name.test_get_num_recorded_errors(

 ## Limits

-* Fixed maximum string length: 50. Longer strings are truncated. For the original Kotlin implementation of the Glean SDK, this is measured in Unicode characters. For the Rust implementation, this is measured in the number of bytes when the string is encoded in UTF-8.
+* Fixed maximum string length: 50. Longer strings are truncated. This is measured in the number of bytes when the string is encoded in UTF-8.

 ## Examples

--- a/docs/user/metrics/string_list.md
+++ b/docs/user/metrics/string_list.md
@ -90,7 +90,7 @@ XCTAssertEqual(1, Search.engines.testGetNumRecordedErrors(.invalidValue))

 * Empty string lists are not accepted by the `set()` method. Attempting to record an empty string list will result in an `invalid_value` error and nothing being recorded.

-* Fixed maximum string length: 50. Longer strings are truncated. For the original Kotlin implementation of the Glean SDK, this is measured in Unicode characters. For the Rust implementation, this is measured in the number of bytes when the string is encoded in UTF-8.
+* Fixed maximum string length: 50. Longer strings are truncated. This is measured in the number of bytes when the string is encoded in UTF-8.

 * Fixed maximum list length: 20 items. Additional strings are dropped.

--- a/docs/user/pings/baseline.md
+++ b/docs/user/pings/baseline.md
@ -5,6 +5,8 @@
 This ping is intended to provide metrics that are managed by the library itself, and not explicitly set by the application or included in the application's `metrics.yaml` file.
 If the application crashes no `baseline` ping is sent, no additional ping is generated with the data from before the crash.

+> **Note:** As the `baseline` ping was specifically designed for mobile operating systems, it is not sent when using the Glean Python bindings.
+
 ## Scheduling

 The `baseline` ping is automatically sent when the application is moved to the [background](index.md#defining-background-state).
--- a/docs/user/pings/custom.md
+++ b/docs/user/pings/custom.md
@ -59,12 +59,25 @@ override fun onCreate() {

 <div data-lang="Python" class="tab">

+For Python, the `pings.yaml` file must be available and loaded at runtime.
+
+If your project is a script (i.e. just Python files in a directory), you can load the `pings.yaml` using:
+
 ```
 from glean import load_pings

 pings = load_pings("pings.yaml")
 ```

+If your project is a distributable Python package, you need to include the `metrics.yaml` file using [one of the myriad ways to include data in a Python package](https://setuptools.readthedocs.io/en/latest/setuptools.html#including-data-files) and then use [`package_resources.resource_filename()`](https://setuptools.readthedocs.io/en/latest/pkg_resources.html#resource-extraction) to get the filename at runtime.
+
+```Python
+from glean import load_pings
+from package_resources import resource_filename
+
+pings = load_pings(resource_filename(__name__, "pings.yaml"))
+```
+
 </div>

 {{#include ../../tab_footer.md}}
@ -113,6 +126,10 @@ Pings.search.send()
 <div data-lang="Python" class="tab">

 ```Python
+from glean import load_pings
+
+pings = load_pings("pings.yaml")
+
 pings.search.send()
 ```

--- a/docs/user/pings/events.md
+++ b/docs/user/pings/events.md
@ -8,14 +8,16 @@ If the application crashes, an `events` ping is generated next time the applicat

 The `events` ping is sent under the following circumstances:

- Normally, it is sent when the application goes into the [background](index.md#defining-background-state), if there are any recorded events to send.
+1. Normally, it is sent when the application goes into the [background](index.md#defining-background-state), if there are any recorded events to send.

- When the queue of events exceeds `Glean.configuration.maxEvents` (default 500).
+2. When the queue of events exceeds `Glean.configuration.maxEvents` (default 500).

- If there are any unsent events found on disk when starting the application. It would be impossible to coordinate the timestamps across a reboot, so it's best to just collect all events from the previous run into their own ping, and start over.
+3. If there are any unsent events found on disk when starting the application. It would be impossible to coordinate the timestamps across a reboot, so it's best to just collect all events from the previous run into their own ping, and start over.

 All of these cases are handled automatically, with no intervention or configuration required by the application.

+> **Note:** Since the Python bindings don't have a concept of "going to background", case (1) above does not apply.
+
 ## Contents
 At the top-level, this ping contains the following keys:

--- a/docs/user/pings/metrics.md
+++ b/docs/user/pings/metrics.md
@ -7,6 +7,8 @@ Ideally, this window is expected to be about 24 hours, given that the collection
 Data in the [`ping_info`](index.md#the-ping_info-section) section of the ping can be used to infer the length of this window.
 If the application crashes, unsent recorded metrics are sent along with the next `metrics` ping.

+> **Note:** As the `metrics` ping was specifically designed for mobile operating systems, it is not sent when using the Glean Python bindings.
+
 ## Scheduling
 The desired behaviour is to collect the ping at the first available opportunity after 4AM local time on a new calendar day. 
 This breaks down into three scenarios:
--- a/docs/user/pings/testing-custom-pings.md
+++ b/docs/user/pings/testing-custom-pings.md
@ -1,4 +1,4 @@
-# Unit testing Glean custom pings
+# Unit testing Glean custom pings for Android

 Applications defining [custom pings](custom.md) can use use the strategy defined in this document to test these pings in unit tests.

--- a/docs/user/testing-metrics.md
+++ b/docs/user/testing-metrics.md
@ -98,11 +98,57 @@ Note that each of these functions is marked as `internal`, you need to import `G

 </div>

+<div data-lang="Python" class="tab">
+
+It is generally a good practice to "reset" Glean prior to every unit test that uses Glean, to prevent side effects of one unit test impacting others.
+Glean contains a helper function `glean.testing.reset_glean()` for this purpose.
+It has two required arguments: the application ID, and the application version.
+Each reset of Glean will create a new temporary directory for Glean to store its data in.
+This temporary directory is automatically cleaned up the next time Glean is reset or when the testing framework finishes.
+
+The instructions below assume you are using [pytest](https://pypi.org/project/pytest/) as the test runner.
+Other test-running libraries have similar features, but are different in the details.
+
+Create a file `conftest.py` at the root of your test directory, and add the following to reset Glean at the start of every test in your suite:
+
+```python
+import pytest
+from glean import testing
+
+@pytest.fixture(name="reset_glean", scope="function", autouse=True)
+def fixture_reset_glean():
+    testing.reset_glean("my-app-id", "0.1.0")
+```
+
+To check if a value exists (i.e. it has been recorded), there is a `test_has_value()` function on each of the metric instances:
+
+```python
+from glean import load_metrics
+metrics = load_metrics("metrics.yaml")
+
+# ...
+
+assert metrics.search.search_engine_url.test_has_value()
+```
+
+To check the actual values, there is a `test_get_value()` function on each of the metric instances.
+It is important to check that the values are recorded as expected, since many of the metric types may truncate or error-correct the value.
+This function will return a datatype appropriate to the specific type of the metric it is being used with:
+
+```python
+assert (
+    "https://example.com/search?" ==
+    metrics.search.default_search_engine_url.test_get_value()
+)
+```
+
+</div>
+
 {{#include ../tab_footer.md}}

 ## Testing metrics for custom pings

-In order to test metrics where the metric is included in more than one ping, the test functions take an optional `pingName` argument.
+In order to test metrics where the metric is included in more than one ping, the test functions take an optional `pingName` argument (`ping_name` in Python).
 This is the name of the ping that the metric is being sent in, such as `"events"` for the [`events` ping](pings/events.md),
 or `"metrics"` for the [`metrics` ping](pings/metrics.md).
 This could also be a custom ping name that the metric is being sent in.
@ -178,4 +224,24 @@ XCTAssertEqual("Courier", events[0].extra?["font"])

 </div>

+<div data-lang="Python" class="tab">
+
+Here is a longer example to better illustrate the intended use of the test API:
+
+```python
+from glean import load_metrics
+metrics = load_metrics("metrics.yaml")
+
+# Record a metric value with extra to validate against
+metrics.url.visit.add(1)
+
+# Check if we collected any events into the 'click' metric
+assert metrics.url.visit.test_has_value()
+
+# Retrieve a snapshot of the recorded events
+assert 1 == metrics.url.visit.test_get_value()
+```
+
+</div>
+
 {{#include ../tab_footer.md}}