From 467fe164ef2edf8f90e06f51c6e11800a2a0f053 Mon Sep 17 00:00:00 2001 From: Stefan Zabka Date: Fri, 13 Mar 2020 16:14:43 +0100 Subject: [PATCH] Added type hints to development pointers Also now autogenerating the ToC and removed trailing whitespace --- README.md | 43 ++++++++++++++++++++++++++++--------------- 1 file changed, 28 insertions(+), 15 deletions(-) diff --git a/README.md b/README.md index 68e4ff35..8c408133 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,7 @@ -OpenWPM + +OpenWPM [![Build Status](https://travis-ci.org/mozilla/OpenWPM.svg?branch=master)](https://travis-ci.org/mozilla/OpenWPM) -[![OpenWPM Matrix Channel](https://img.shields.io/matrix/OpenWPM:mozilla.org?label=Join%20us%20on%20matrix&server_fqdn=mozilla.modular.im)](https://matrix.to/#/!pFJihVSEWzcMCcOzSH:mozilla.org?via=mozilla.org) +[![OpenWPM Matrix Channel](https://img.shields.io/matrix/OpenWPM:mozilla.org?label=Join%20us%20on%20matrix&server_fqdn=mozilla.modular.im)](https://matrix.to/#/!pFJihVSEWzcMCcOzSH:mozilla.org?via=mozilla.org) ======= OpenWPM is a web privacy measurement framework which makes it easy to @@ -9,21 +10,25 @@ of websites. OpenWPM is built on top of Firefox, with automation provided by Selenium. It includes several hooks for data collection. Check out the instrumentation section below for more details. -Table of Contents ------------------ +Table of Contents +------------------ * [Installation](#installation) * [Quick Start](#quick-start) * [Instrumentation and Data Access](#instrumentation-and-data-access) -* [Output Formats](#output-format) - * [Local Databases](#local-databases) - * [Parquet on Amazon S3 (Experimental)](#parquet-on-amazon-s3-experimental) +* [Output Format](#output-format) + * [Local Databases](#local-databases) + * [Parquet on Amazon S3 **Experimental**](#parquet-on-amazon-s3-experimental) * [Browser and Platform Configuration](#browser-and-platform-configuration) - * [Browser Configuration Options](#platform-configuration-options) + * [Platform Configuration Options](#platform-configuration-options) + * [Browser Configuration Options](#browser-configuration-options) * [Browser Profile Support](#browser-profile-support) * [Stateful vs Stateless crawls](#stateful-vs-stateless-crawls) * [Loading and saving a browser profile](#loading-and-saving-a-browser-profile) + * [Save a profile](#save-a-profile) + * [Load a profile](#load-a-profile) * [Development pointers](#development-pointers) + * [Types Annotations in Python](#types-annotations-in-python) * [Editing instrumentation](#editing-instrumentation) * [Debugging the platform](#debugging-the-platform) * [Managing requirements](#managing-requirements) @@ -129,9 +134,9 @@ available [below](#output-format). LevelDB content database. * NOTE: this instrumentation may lead to performance issues when a large number of browsers are in use. - * Set `browser_params['save_content']` to a comma-separated list of + * Set `browser_params['save_content']` to a comma-separated list of [resource_types](https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/webRequest/ResourceType) - to save only specific types of files, for instance + to save only specific types of files, for instance `browser_params['save_content'] = "script"` to save only Javascript files. This will lessen the performance impact of this instrumentation when a large number of browsers are used in parallel. @@ -220,8 +225,8 @@ inline by sending a `create_table` message to the data aggregator. #### Parquet on Amazon S3 **Experimental** As an option, OpenWPM can save data directly to an Amazon S3 bucket as a Parquet Dataset. This is currently experimental and hasn't been thoroughly -tested. Screenshots, and page source saving is not currently supported and -will still be stored in local databases and directories. To enable S3 +tested. Screenshots, and page source saving is not currently supported and +will still be stored in local databases and directories. To enable S3 saving specify the following configuration parameters in `manager_params`: * Output format: `manager_params['output_format'] = 's3'` * S3 bucket name: `manager_params['s3_bucket'] = 'openwpm-test-crawl'` @@ -411,12 +416,20 @@ Much of OpenWPM's instrumentation is included in a Firefox add-on SDK extension. Thus, in order to add or change instrumentation you will need a few additional dependencies, which can be installed with `install-dev.sh`. +### Types Annotations in Python + +We as maintainers have decided it would be helpful to have Python3 type annotations +for the python part of this project to catch errors earlier, get better +code completion and allow bigger changes down the line with more confidence. +As such you should strive to add type annotations to all new code you add to +the project as well as the one you plan to change fundamentally. + ### Editing instrumentation The instrumentation extension is included in `/automation/Extension/firefox/`. -The instrumentation itself (used by the above extension) is included in +The instrumentation itself (used by the above extension) is included in `/automation/Extension/webext-instrumentation/`. -Any edits within these directories will require the extension to be re-built to produce +Any edits within these directories will require the extension to be re-built to produce a new `openwpm.xpi` with your updates. You can use `build_extension.sh` to do this. ### Debugging the platform @@ -429,7 +442,7 @@ continuing the crawl). We recommend using This utility allows manual debugging of the extension instrumentation with or without Selenium enabled, as well as makes it easy to launch a Selenium instance (without any instrumentation) -* `build-extension.sh` +* `build-extension.sh` * `python -m test.manual_test` builds the current extension directory and launches a Firefox instance with it. * `python -m test.manual_test --selenium` launches a Firefox Selenium instance