ETL code that produces the addons_daily derived dataset.
Перейти к файлу
benmiroglio b1a947ed1d Update tests 2019-06-04 14:57:13 -07:00
.circleci Remove remaining py27 job 2019-05-30 16:00:00 -07:00
addons_daily Update tests 2019-06-04 14:57:13 -07:00
tests Update tests 2019-06-04 14:57:13 -07:00
.gitignore Change search metrics, do not use collect_list 2019-06-04 14:46:18 -07:00
METRICS.md Update METRICS.md 2019-03-25 14:03:23 -07:00
README.md Fix or xfail tests 2019-05-08 11:55:13 -07:00
setup.py Format python with black 2019-05-09 16:14:30 -07:00
tox.ini Adjust filters for production 2019-06-03 11:37:05 -07:00

README.md

CircleCI

addons_daily Derived Dataset

Contributers: Sarah Melancon, Ben Miroglio, Brian Wright, Daniel Thorn

This ETL code produces daily aggregates of Firefox extensions. It supports the broader "Extention Data for Developers" Project.

Deriving a Dataset

Extension data is stored in multiple different places, the main sources being addons.mozilla.org (AMO), about:addons (disco-pane) and telemetry user data. We know a lot about extensions thanks to these sources, however, it is difficult to get a birds eye view. We know how many users have extension X, and their associated browser behavior from telemetry. We know extension X got, say, 1000 views last week on AMO from Google Analytics dashboards, and we know it was installed 100 times from telemetry. We need this data to all live in one place, broken out by each individual extension to help support (first and foremost) product decisions around extensions.

This part of the project should allow someone to learn everything there is to know about extension X after looking in one place (at this point the “place” is a dataset). This repo accomplishes all of the above, the ideal next steps involve sharing these insights with a broader audience.

Sharing Insights

Developers get a narrow view of their extension in the wild through AMOs dashboards (i.e. see Privacy Badgers Dashboard). These dashboards are great for tracking user uptake and and language/OS breakdowns, however it completely omits user behavior.

For instance, how long does it take extension Xs browser action (the button on the Firefox toolbar) to load compared to the average? How many tabs do users with extension X typically use compared to the average? Compared to extension Y? These are just some examples of things a developer would interested to know when prioritizing improvements to their software (and also very useful for Mozillas internal use!).

This part of the project should first allow Mozilla to better understand how individual extensions are behaving with minimal effort by way of a dashboard/report. Lastly, the hope is to share a subset of this data with developers/the public. An idea what this might look like is a lightweight version of the public data report sans any commentary (since there will be hundreds of add-ons), or a more zoomed out representation of TMO.