[doc only] Document memory distribution metric type (#243)

* [doc only] Document memory distribution metric type * Fix copy-pasta * Better describe the filling in of missing values in payloads * Address comments in the PR * Update docs/user/metrics/memory_distribution.md Co-Authored-By: Jan-Erik Rediger <badboy@archlinux.us>
2019-08-14 10:12:53 -04:00 · 2019-08-14 10:12:53 -04:00 · 864aa9a239
--- a/docs/SUMMARY.md
+++ b/docs/SUMMARY.md
@ -18,6 +18,7 @@
        - [String List](user/metrics/string_list.md)
        - [Timespan](user/metrics/timespan.md)
        - [Timing Distribution](user/metrics/timing_distribution.md)
+        - [Memory Distribution](user/metrics/memory_distribution.md)
        - [UUID](user/metrics/uuid.md)
        - [Datetime](user/metrics/datetime.md)
        - [Event](user/metrics/event.md)
--- a/docs/dev/core/internal/payload.md
+++ b/docs/dev/core/internal/payload.md
@ -75,7 +75,45 @@ A [Timing distribution](../../../user/metrics/timing_distribution.md) is represe
 | Field name | Type | Description |
 |---|---|---|
 | `sum` | Integer | The sum of all recorded values. |
-| `values` | Map&lt;String, Integer&gt; | The values in each bucket. The key is the minimum value for the range of that bucket. Buckets with no values are not reported. |
+| `values` | Map&lt;String, Integer&gt; | The values in each bucket. The key is the minimum value for the range of that bucket. |
+
+A contiguous range of buckets is always sent, so that the server can aggregate and visualize distributions, without knowing anything about the specific bucketing function used.
+This range starts with the first bucket with a non-zero accumulation, and ends at one bucket beyond the last bucket with a non-zero accumulation (so that the upper bound on the last bucket is retained).
+
+For example, the following shows the recorded values vs. what is sent in the payload.
+
+```
+recorded:  1024: 2, 1116: 1,                   1448: 1,
+sent:      1024: 2, 1116: 1, 1217: 0, 1327: 0, 1448: 1, 1579: 0
+```
+
+#### Example:
+
+```json
+{
+    "sum": 3,
+    "values": {
+        "1024": 2,
+        "1116": 1,
+        "1217": 0,
+        "1327": 0,
+        "1448": 1,
+        "1579": 0
+    }
+}
+```
+
+### Memory Distribution
+
+A [Memory distribution](../../../user/metrics/memory_distribution.md) is represented as an object with the following fields.
+
+| Field name | Type | Description |
+|---|---|---|
+| `sum` | Integer | The sum of all recorded values. |
+| `values` | Map&lt;String, Integer&gt; | The values in each bucket. The key is the minimum value for the range of that bucket. |
+
+A contiguous range of buckets is always sent.
+See [timing distribution](#timing-distribution) for more details.

 #### Example:

@ -157,7 +195,24 @@ A [Custom distribution](../../../user/metrics/custom_distribution.md) is represe
 | Field name | Type | Description |
 |---|---|---|
 | `sum` | Integer | The sum of all recorded values. |
-| `values` | Map&lt;String, Integer&gt; | The values in each bucket. The key is the minimum value for the range of that bucket. All buckets [0, max) are reported, so that the histograms can be aggregated in the pipeline without the pipeline knowing anything about the distribution of the buckets. | 
+| `values` | Map&lt;String, Integer&gt; | The values in each bucket. The key is the minimum value for the range of that bucket. |
+
+A contiguous range of buckets is always sent, so that the server can aggregate and visualize distributions, without knowing anything about the specific bucketing function used.
+This range starts with the first bucket (as specified in the `range_min` parmater), and ends at one bucket beyond the last bucket with a non-zero accumulation (so that the upper bound on the last bucket is retained).
+
+For example, suppose you had a custom distribution defined by the following parameters:
+
+  - `range_min`: 10
+  - `range_max`: 200
+  - `bucket_count`: 80
+  - `histogram_type`: `'linear'`
+
+The following shows the recorded values vs. what is sent in the payload.
+
+```
+recorded:        12: 2,                      22: 1
+sent:     10: 0, 12: 2, 14: 0, 17: 0, 19: 0, 22: 1, 24: 0
+```

 #### Example:

@ -165,8 +220,13 @@ A [Custom distribution](../../../user/metrics/custom_distribution.md) is represe
 {
    "sum": 3,
    "values": {
-        "0": 1,
-        "1": 3,
+        "10": 0,
+        "12": 2,
+        "14": 0,
+        "17": 0,
+        "19": 0,
+        "22": 1,
+        "24": 0
    }
 }
 ```
--- a/docs/user/metrics/index.md
+++ b/docs/user/metrics/index.md
@ -18,6 +18,8 @@ There are different metrics to choose from, depending on what you want to achiev

 * [Timing Distribution](timing_distribution.md): Used to record the distribution of multiple time measurements.

+* [Memory Distribution](memory_distribution.md): Used to record the distribution of memory sizes.
+
 * [UUID](uuid.md): Used to record universally unique identifiers (UUIDs), such as a client ID.

 * [Datetime](datetime.md): Used to record an absolute date and time, such as the time the user first ran the application.
--- a/docs/user/metrics/memory_distribution.md
+++ b/docs/user/metrics/memory_distribution.md
@ -0,0 +1,72 @@
+# Memory Distribution
+
+Memory distributions are used to accumulate and store memory sizes.
+
+Memory distributions are recorded in a histogram where the buckets have an exponential distribution, specifically with 16 buckets for every power of 2.
+This makes them suitable for measuring memory sizes on a number of different scales without any configuration.
+
+## Configuration
+
+If you wanted to create a memory distribution to measure the amount of heap memory allocated, first you need to add an entry for it to the `metrics.yaml` file:
+
+```YAML
+memory:
+  heap_allocated:
+    type: memory_distribution
+    description: >
+      The heap memory allocated
+    memory_unit: kilobyte
+    ...
+```
+
+## API
+
+Now you can use the memory distribution from the application's code.
+
+For example, to measure the distribution of heap allocations:
+
+```Kotlin
+import org.mozilla.yourApplication.GleanMetrics.Memory
+
+fun allocateMemory(nbytes: Int) {
+    // ...
+    Memory.heapAllocated.accumulate(nbytes / 1024)
+}
+```
+
+There are test APIs available too.  For convenience, properties `sum` and `count` are exposed to facilitate validating that data was recorded correctly.
+
+Continuing the `heapAllocated` example above, at this point the metric should have a `sum == 11` and a `count == 2`:
+
+```Kotlin
+import org.mozilla.yourApplication.GleanMetrics.Memory
+
+// Was anything recorded?
+assertTrue(Memory.heapAllocated.testHasValue())
+
+// Get snapshot
+val snapshot = Memory.heapAllocated.testGetValue()
+
+// Does the sum have the expected value?
+assertEquals(11, snapshot.sum)
+
+// Usually you don't know the exact memory values, but how many should have been recorded.
+assertEquals(2L, snapshot.count())
+```
+
+## Limits
+
+* The maxmimum memory size that can be recorded is 1 Terabyte (2<sup>40</sup> bytes). Larger sizes will be truncated to 1 Terabyte.
+
+## Examples
+
+* What is the distribution of the size of heap allocations?
+
+## Recorded errors
+
+* `invalid_value`: If recording a negative memory size.
+* `invalid_value`: If recording a size larger than 1TB.
+
+## Reference
+
+* See [Kotlin API docs](../../../javadoc/glean/mozilla.telemetry.glean.private/-memory-distribution-metric-type/index.html)