ETL code that produces the addons_daily derived dataset.
Перейти к файлу
benmiroglio f102c2840e Move more python data to json 2019-05-28 18:57:17 -07:00
.circleci attemping a commit 2019-05-15 14:59:05 -07:00
addons_daily Move large python-defined data to files 2019-05-28 18:22:47 -07:00
tests Move more python data to json 2019-05-28 18:57:17 -07:00
utils reworking tests to pass pytest-black 2019-05-15 15:26:26 -07:00
.gitignore Reuse spark session, delete cache 2019-05-28 12:26:10 -07:00
METRICS.md Update METRICS.md 2019-03-25 14:03:23 -07:00
README.md attemping a commit 2019-05-15 14:59:05 -07:00
setup.py attemping a commit 2019-05-15 14:59:05 -07:00
tox.ini Add moztelemetry dependency 2019-05-28 14:08:25 -07:00

README.md

CircleCI

addons_daily Derived Dataset

Contributers: Sarah Melancon, Ben Miroglio, Brian Wright, Daniel Thorn

This ETL code produces daily aggregates of Firefox extensions. It supports the broader "Extention Data for Developers" Project.

Deriving a Dataset

Extension data is stored in multiple different places, the main sources being addons.mozilla.org (AMO), about:addons (disco-pane) and telemetry user data. We know a lot about extensions thanks to these sources, however, it is difficult to get a birds eye view. We know how many users have extension X, and their associated browser behavior from telemetry. We know extension X got, say, 1000 views last week on AMO from Google Analytics dashboards, and we know it was installed 100 times from telemetry. We need this data to all live in one place, broken out by each individual extension to help support (first and foremost) product decisions around extensions.

This part of the project should allow someone to learn everything there is to know about extension X after looking in one place (at this point the “place” is a dataset). This repo accomplishes all of the above, the ideal next steps involve sharing these insights with a broader audience.

Sharing Insights

Developers get a narrow view of their extension in the wild through AMOs dashboards (i.e. see Privacy Badgers Dashboard). These dashboards are great for tracking user uptake and and language/OS breakdowns, however it completely omits user behavior.

For instance, how long does it take extension Xs browser action (the button on the Firefox toolbar) to load compared to the average? How many tabs do users with extension X typically use compared to the average? Compared to extension Y? These are just some examples of things a developer would interested to know when prioritizing improvements to their software (and also very useful for Mozillas internal use!).

This part of the project should first allow Mozilla to better understand how individual extensions are behaving with minimal effort by way of a dashboard/report. Lastly, the hope is to share a subset of this data with developers/the public. An idea what this might look like is a lightweight version of the public data report sans any commentary (since there will be hundreds of add-ons), or a more zoomed out representation of TMO.