Change branch references to main in docs and comments
This commit is contained in:
Родитель
70792f0ad3
Коммит
01825c82e8
|
@ -1,6 +1,6 @@
|
|||
#!/bin/bash
|
||||
|
||||
# NOTE: The version of this file on the master branch may also be pulled in
|
||||
# NOTE: The version of this file on the main branch may also be pulled in
|
||||
# by other projects (telemetry-streaming in particular), so keep that in mind
|
||||
# when proposing changes.
|
||||
|
||||
|
|
|
@ -57,10 +57,10 @@ This dataset was used for experiment analysis, before it was deprecated in [Bug
|
|||
|
||||
* [Removal PR](https://github.com/mozilla/telemetry-batch-view/pull/553)
|
||||
|
||||
These jobs were reimplemented as BigQuery SQL in [bigquery-etl/sql/telemetry_derived/](https://github.com/mozilla/bigquery-etl/tree/master/sql/telemetry_derived/).
|
||||
These jobs were reimplemented as BigQuery SQL in [bigquery-etl/sql/telemetry_derived/](https://github.com/mozilla/bigquery-etl/).
|
||||
|
||||
## Experiments Summary
|
||||
|
||||
* [Removal PR](https://github.com/mozilla/telemetry-batch-view/pull/558)
|
||||
|
||||
This job was reimplemented as BigQuery SQL in [bigquery-etl/sql/telemetry_derived/experiments_v1/query.sql](https://github.com/mozilla/bigquery-etl/blob/master/sql/telemetry_derived/experiments_v1/query.sql).
|
||||
This job was reimplemented as BigQuery SQL in [bigquery-etl/sql/telemetry_derived/experiments_v1/query.sql](https://github.com/mozilla/bigquery-etl/).
|
||||
|
|
10
README.md
10
README.md
|
@ -2,8 +2,8 @@
|
|||
|
||||
This is a Scala application to build derived datasets, also known as [batch views](http://robertovitillo.com/2016/01/06/batch-views/), of [Telemetry](https://wiki.mozilla.org/Telemetry) data.
|
||||
|
||||
[![Build Status](https://travis-ci.org/mozilla/telemetry-batch-view.svg?branch=master)](https://travis-ci.org/mozilla/telemetry-batch-view)
|
||||
[![codecov.io](https://codecov.io/github/mozilla/telemetry-batch-view/coverage.svg?branch=master)](https://codecov.io/github/mozilla/telemetry-batch-view?branch=master)
|
||||
[![Build Status](https://travis-ci.org/mozilla/telemetry-batch-view.svg?branch=main)](https://travis-ci.org/mozilla/telemetry-batch-view)
|
||||
[![codecov.io](https://codecov.io/github/mozilla/telemetry-batch-view/coverage.svg?branch=main)](https://codecov.io/github/mozilla/telemetry-batch-view?branch=main)
|
||||
[![CircleCi Status](https://circleci.com/gh/mozilla/telemetry-batch-view.svg?style=shield&circle-token=ca31167ac42cc39f898e37facb93db70c0af8691)](https://circleci.com/gh/mozilla/telemetry-batch-view)
|
||||
|
||||
Raw JSON [pings](https://ci.mozilla.org/job/mozilla-central-docs/Tree_Documentation/toolkit/components/telemetry/telemetry/pings.html) are stored on S3 within files containing [framed Heka records](https://hekad.readthedocs.org/en/latest/message/index.html#stream-framing). Reading the raw data in through e.g. Spark can be slow as for a given analysis only a few fields are typically used; not to mention the cost of parsing the JSON blobs. Furthermore, Heka files might contain only a handful of records under certain circumstances.
|
||||
|
@ -12,7 +12,7 @@ Defining a derived [Parquet](https://parquet.apache.org/) dataset, which uses a
|
|||
|
||||
### Adding a new derived dataset
|
||||
|
||||
See the [views](https://github.com/mozilla/telemetry-batch-view/tree/master/src/main/scala/com/mozilla/telemetry/views) folder for examples of jobs that create derived datasets.
|
||||
See the [views](https://github.com/mozilla/telemetry-batch-view/tree/main/src/main/scala/com/mozilla/telemetry/views) folder for examples of jobs that create derived datasets.
|
||||
|
||||
See the [Firefox Data Documentation](https://mozilla.github.io/firefox-data-docs/datasets/reference.html) for more information about the individual derived datasets.
|
||||
For help finding the right dataset for your analysis, see
|
||||
|
@ -60,7 +60,7 @@ sbt scalastyle test:scalastyle
|
|||
|
||||
### Generating Datasets
|
||||
|
||||
See the [documentation for specific views](https://github.com/mozilla/telemetry-batch-view/tree/master/docs) for details about running/generating them.
|
||||
See the [documentation for specific views](https://github.com/mozilla/telemetry-batch-view/tree/main/docs) for details about running/generating them.
|
||||
|
||||
For example, to create a longitudinal view locally:
|
||||
```bash
|
||||
|
@ -100,4 +100,4 @@ System.setProperty("spark.sql.warehouse.dir", "file:///C:/somereal-dir/spark-war
|
|||
|
||||
See [SPARK-10528](https://issues.apache.org/jira/browse/SPARK-10528). Run "winutils chmod 777 /tmp/hive" from a privileged prompt to make it work.
|
||||
|
||||
Any commits to master should also trigger a circleci build that will do the sbt publishing for you to our local maven repo in s3.
|
||||
Any commits to main should also trigger a circleci build that will do the sbt publishing for you to our local maven repo in s3.
|
||||
|
|
|
@ -60,7 +60,7 @@ class DatasetShim private(dataset: String, attributes: Map[String, String]) {
|
|||
.map(row => parse(row.getString(0)))
|
||||
.map(doc => {
|
||||
// submission_timestamp is generated by the edge server and must be 'an ISO 8601 timestamp with microseconds and timezone "Z"'
|
||||
// https://github.com/mozilla/gcp-ingestion/blob/master/docs/edge.md#edge-server-pubsub-message-schema
|
||||
// https://mozilla.github.io/gcp-ingestion/architecture/edge_service_specification/#general-data-flow
|
||||
val JString(st) = doc \ "submission_timestamp"
|
||||
val submissionTimestamp = ZonedDateTime.parse(st, ISO_DATE_TIME)
|
||||
// provide doc \ "meta" to better match com.mozilla.telemetry.heka.Message.toJValue
|
||||
|
|
Загрузка…
Ссылка в новой задаче