README.md
ensemble-transposer re-formats existing data so that it can be used by the Firefox Public Data Report.
Mozilla already publishes raw data: numbers and identifiers. That's great, but it can be difficult to work with. ensemble-transposer takes that raw data, organizes it, adds useful information like explanations, and generates a series of files that are much easier for developers to work with. Ensemble, the platform that powers the Firefox Public Data Report, uses this improved and re-formatted data to build dashboards.
Other applications are also welcome to use the data that ensemble-transposer outputs. See the API documentation for more information.
ensemble-transposer can easily enhance any data that adheres to this format. It can also process Redash dashboards (see this example configuration file). Let us know if you have any questions or if you have a dataset that you would like us to spruce up.
API
Re-formatted data is currently hosted under the data.firefox.com domain, but you are also welcome to run ensemble-transposer yourself and host the re-formatted data elsewhere.
- Valid
platform
values: desktop - Valid
datasetName
values: hardware, user-activity, usage-behavior - Valid
categoryName
values: Listed in the output of the /datasets/[platform]/[datasetName] endpoint - Valid
metricName
values: Listed in the output of the /datasets/[platform]/[datasetName] endpoint
/datasets/[platform]/[datasetName]/index.json
For example: https://data.firefox.com/datasets/desktop/user-activity/index.json
A summary of the given dataset. For example, this includes a description of the dataset and a list of all metrics within it.
/datasets/[platform]/[datasetName]/[categoryName]/[metricName]/index.json
For example: https://data.firefox.com/datasets/desktop/user-activity/Italy/YAU/index.json
Everything you need to know about a given metric in a given category. For example, this includes a title, a description, and a set of suggested axis labels.
Development
Setup
- Install Docker
- Create a new Amazon S3 bucket
- Copy .env-dist to .env and provide values for all environment variables
Inspecting output
Run make start
and inspect that data that is uploaded to S3.
Testing
Run make test
to lint code and run standard tests.
Run make compare
to compare the data in your S3 bucket to the data in the
production S3 bucket. This can be useful when upgrading packages or refactoring
code, for example.
Deployment
AWS
This project was originally meant to be run as a cloud task, like a Lambda function or
Google Cloud Function. The main function is specified as the value of main
in
package.json. Most services read this value and do the right thing. If not,
you may need to manually point your service to that function.
Before triggering the function, be sure to create an Amazon S3 bucket and set the following environment variables:
AWS_BUCKET_NAME
AWS_REGION
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
Google Cloud
This project can be run as a Docker container. The default command is npm start
, but it may need
to be explicitly configured in some environments. When running the container in GKE, authentication
will be automatically detected. Before running, be sure to create a Google Cloud
Storage bucket and set the following environment variable:
GCS_BUCKET_NAME
Other
When neither AWS_BUCKET_NAME
nor GCS_BUCKET_NAME
are present in the environment, this project
will write data to ./target
, which can then be copied to otherwise unsupported systems.
Notes
Versioning
We maintain a version number for this project in package.json. It should be incremented whenever new code is pushed.
The number looks like a semantic version number, but semver isn't meant for applications. We instead follow these basic guidelines: the first number is incremented for major changes, the second number is incremented for medium-sized changes, and the third number is incremented for small changes.