Граф коммитов

59 Коммитов

Автор SHA1 Сообщение Дата
Georgia Kokkinou c51f9e56bf
Fix minor typos 2021-03-21 23:59:18 +02:00
Georgia Kokkinou daa6dba4e3
Enable stateful crawling and tests
Reenable stateful crawling and profile tests. Also, update the docs now
that stateful crawling is supported. Currently, stateful crawling is
broken, as geckodriver deletes the browser profile when closing or
crashing before we can archive it.
2021-03-01 17:18:00 +02:00
Stefan Zabka b29c3f4052
Data Aggregator Rewrite (#753)
* First steps in the rewrite

* Fixed import paths

* One giant refactor

* Fixing tests

* Adding mypy

* Removed mypy from pre-commit workflow

* First draft on DataAggregator

* Wrote a DataAggregator that starts and shuts down

* Created tests and added more empty types

* Got demo.py working

* Created sql_provider

* Cleaned up imports in TaskManager

* Added async

* Fixed minor bugs

* First steps at porting arrow

* Introduced TableName and different Task handling

* Added more failing tests

* First first completes others don't

* It works

* Started working on arrow_provider

* Implemented ArrowProvider

* Added logger fixture

* Fixed test_storage_controller

* Fixing OpenWPMTest.visit()

* Moved test/storage_providers to test/storage

* Fixing up tests

* Moved automation to openwpm

* Readded datadir to .gitignore

* Ran repin.sh

* Fixed formatting

* Let's see if this works

* Fixed imports

* Got arrow_memory_provider working

* Starting to rewrite tests

* Setting up fixtures

* Attempting to fix all the tests

* Still fixing tests

* Broken content saving

* Added node

* Fixed screenshot tests

* Fixing more tests

* Fixed tests

* Implemented local_storage.py

* Cleaned up flush_cache

* Fixing more tests

* Wrote test for LocalArrowProvider

* Introduced tests for local_storage_provider.py

* Asserting test dir is empty

* Creating subfolder for different aggregators

* New depencies and init()

* Everything is terribly broken

* Figured out finalize_visit_id

* Running two event loops kinda works???

* Rearming the event

* Introduced mypy

* Downgraded black in pre-commit

* Modifying the database directly

* Fixed formatting

* Made mypy a lil stricter

* Fixing docs and config printing

* Realising I've been using the wrong with

* Trying to figure arrow_storage

* Moving lock initialization in in_memory_storage

* Fixing tests

* Fixing up tests and adding more typechecking

* Fixed num_browsers in test_cache_hits_recorded

* Parametrized unstructured

* String fix

* Added failing test

* New test

* Review changes with Steven

* Fixed repin.sh and test_arrow_cache

* Minor change

* Fixed prune-environment.py

* Removing references to DataAggregator

* Fixed test_seed_persistance

* More paths

* Fixed test display shutdown

* Made cache test more robust

* Update crawler.py

Co-authored-by: Steven Englehardt <senglehardt@mozilla.com>

* Slimming down ManagerParams

* Fixing more tests

* Update test/storage/test_storage_controller.py

Co-authored-by: Steven Englehardt <senglehardt@mozilla.com>

* Purging references to DataAggregator

* Reverted changes to .travis.yml

* Demo.py saves locally again

* Readjusting test paths

* Expanded comment on initialize to reference #846

* Made token optional in finalize_visit_id

* Simplified test paramtetrization

* Fixed callback semantics change

* Removed test_parse_http_stack_trace_str

* Added DataSocket

* WIP need to fix path encoding

* Fixed path encoding

* Added task and crawl to schema

* Fixed paths in GitHub actions

* Refactored completion handling

* Fix tests

* Trying to fix tests on CI

* Removed redundant setting of tag

* Removing references to S3

* Purging more DataAggregator references

* Craking up logging to figure out test failure

* Moved test_values into a fixture

* Fixing GcpUnstructuredProvider

* Fixed paths for future crawls

* Renamed sqllite to official sqlite

* Restored demo.py

* Update openwpm/commands/profile_commands.py

Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com>

* Restored previous behaviour of DumpProfileCommand

Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com>

* Removed leftovers

* Cleaned up comments

* Expanded lock check

* Fixed more stuff

* More comment updates

* Update openwpm/socket_interface.py

Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com>

* Removed outdated comment

* Using config_encoder

* Renamed tar_location to tar_path

* Removed references to database_name in docs

* Cleanup

* Moved screenshot_path and source_dump_path to ManagerParamsInternal

* Fixed imports

* Fixing up comments

* Fixing up comments

* More docs

* updated dependencies

* Fixed test_task_manager

* Reupgraded to python 3.9.1

* Restoring crawl_reference in mp_logger

* Removed unused imports

* Apply suggestions from code review

Co-authored-by: Steven Englehardt <senglehardt@mozilla.com>

* Cleaned up socket handling

* Fixed TaskManager.__exit__

* Moved validation code into config.py

* Removed comment

* Removed comment

* Removed comment

Co-authored-by: Steven Englehardt <senglehardt@mozilla.com>
Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com>
2021-02-22 17:51:32 +01:00
Stefan Zabka e338bb29f0
Command refactoring (#750)
* Refactored GetCommand, BrowseCommand to have execute method

* Fixed type name format issues in __issue_command

* Fixed everything I broke

* Changed import style so tests can run

* Added BrowseCommad to imports

* Added some more self

* Added logging to explain failing test

* Added one more self

* attempt at refactoring save_screenshot

* fixed indentation, attempt at refactoring save_screenshot

* refactored SaveScreenshot command to have execute method

* reformatted code using black

* Ported SaveScreenshotCommand

It now uses the new command.execute(...) syntax

* refactored savefullscreenshot command to follow command sequence

* formatted files with black

* removed extraneous commands

* Ported SaveScreenshotFullPage #763

* refactored dump page source and formatted code with black

* reformatted recursive dump page source command and formatted code w black

* formatted files using isort

* formatted all files with isort

* Ported DumpPageSource and RecursiveDumpPageSource (#767)

* refactor finalize command

* refactored initalize command and formatted with black and isort

* missed a conflict

* Command refactoring (#770)

* attempt at refactoring save_screenshot

* fixed indentation, attempt at refactoring save_screenshot

* refactored SaveScreenshot command to have execute method

* reformatted code using black

* refactored savefullscreenshot command to follow command sequence

* formatted files with black

* removed extraneous commands

* refactored dump page source and formatted code with black

* reformatted recursive dump page source command and formatted code w black

* formatted files using isort

* formatted all files with isort

* refactor finalize command

* refactored initalize command and formatted with black and isort

* missed a conflict

* Ran isort

* Added append_command

* remove custom function command and format code

* Refactored GetCommand, BrowseCommand to have execute method

* Fixed type name format issues in __issue_command

* Fixed everything I broke

* Changed import style so tests can run

* Added BrowseCommad to imports

* Added some more self

* Added logging to explain failing test

* Added one more self

* Ported SaveScreenshotCommand

It now uses the new command.execute(...) syntax

* Ported SaveScreenshotFullPage #763

* Ported DumpPageSource and RecursiveDumpPageSource (#767)

* Command refactoring (#770)

* attempt at refactoring save_screenshot

* fixed indentation, attempt at refactoring save_screenshot

* refactored SaveScreenshot command to have execute method

* reformatted code using black

* refactored savefullscreenshot command to follow command sequence

* formatted files with black

* removed extraneous commands

* refactored dump page source and formatted code with black

* reformatted recursive dump page source command and formatted code w black

* formatted files using isort

* formatted all files with isort

* refactor finalize command

* refactored initalize command and formatted with black and isort

* missed a conflict

* Ran isort

* Added append_command

* remove duplicate append_command

* Refactored GetCommand, BrowseCommand to have execute method

* Fixed type name format issues in __issue_command

* Fixed everything I broke

* Changed import style so tests can run

* Added BrowseCommad to imports

* Added some more self

* Added logging to explain failing test

* Added one more self

* Ported SaveScreenshotCommand

It now uses the new command.execute(...) syntax

* Ported SaveScreenshotFullPage #763

* Ported DumpPageSource and RecursiveDumpPageSource (#767)

* Command refactoring (#770)

* attempt at refactoring save_screenshot

* fixed indentation, attempt at refactoring save_screenshot

* refactored SaveScreenshot command to have execute method

* reformatted code using black

* refactored savefullscreenshot command to follow command sequence

* formatted files with black

* removed extraneous commands

* refactored dump page source and formatted code with black

* reformatted recursive dump page source command and formatted code w black

* formatted files using isort

* formatted all files with isort

* refactor finalize command

* refactored initalize command and formatted with black and isort

* missed a conflict

* Ran isort

* Added append_command

* generate new xpi

* Fixing tests

* Fixing tests

* Fixing up more tests

* Removed type annotations

* Fixing tests

* Fixing tests

* Removed command_executor

* Moved Commands to commands

* Fixing imports

* Fixed skipped test

* Removed duplicate append_command

* docs: update adding command in usingOpenWPM

* Forgot to save

* Removed datadir

* Cleaning up imports

* Implemented simple command

* Added documentation to simple_command.py

* Renamed to custom_command.py

* Moved docs around

* Referencing BaseCommand.execute

* Update docs/Using_OpenWPM.md

Co-authored-by: Steven Englehardt <senglehardt@mozilla.com>

Co-authored-by: Cyrus <cyruskarsan@gmail.com>
Co-authored-by: cyruskarsan <55566678+cyruskarsan@users.noreply.github.com>
Co-authored-by: Steven Englehardt <senglehardt@mozilla.com>
2021-01-09 11:15:01 +01:00
Ankush Dua db1186a9f6
Refactoring browser and manager params into dataclasses (#807)
* initial file commit

* add new dependency for dataclasses

* implemeted basic BrowserParams dataclass

* dependencies update

* file reformat

* implemented basic ManagerParams dataclass

* Update environment dependencies

* Added new error class to validate
 browser and manager params

* file reformat

* Update scripts/environment-unpinned.yaml

Co-authored-by: Stefan Zabka <zabkaste@informatik.hu-berlin.de>

* added validations for BrowserParams dataclass

* Update openwpm/config.py

Co-authored-by: Stefan Zabka <zabkaste@informatik.hu-berlin.de>

* Removed unnecessary checks

Co-authored-by: Stefan Zabka <zabkaste@informatik.hu-berlin.de>

* Changed error string formatting

Co-authored-by: Stefan Zabka <zabkaste@informatik.hu-berlin.de>

* Changed filenamea and necessary imports to resolve conflicts with new master branch(refering to PEP-8 reformatting)

* Revert "Changed filenamea and necessary imports to resolve conflicts with new master branch(refering to PEP-8 reformatting)"

This reverts commit e550c3bd60.

* Revert "Merge branch 'master' into turn_browser_and_manager_params_into_dataclasses"

This reverts commit aff5a384e7, reversing
changes made to 6ecaf5d0a9.

* Revert "Update environment dependencies"

This reverts commit 385825b10a.

* Revert "Merge branch 'turn_browser_and_manager_params_into_dataclasses' of https://github.com/ankushduacodes/OpenWPM into turn_browser_and_manager_params_into_dataclasses"

This reverts commit 6ecaf5d0a9, reversing
changes made to e550c3bd60.

* file reformat

* finalized validate_browser_params function

* fixed typo in error string

* added validations for manager_params

* Explanation for using list for supported browser

* Revert "Revert "Merge branch 'master' into turn_browser_and_manager_params_into_dataclasses""

This reverts commit 6c3e98e57b.

* Revert "Revert "Changed filenamea and necessary imports to resolve conflicts with new master branch(refering to PEP-8 reformatting)""

This reverts commit fc8f48f187.

* import name change from .Error to .error

* moved call_instrument check to config.py

* fixed accidental use of dict syntax in a class

* moved save_content check from deploy_firefox.py

* deleting redundent file

* deleted more redundent files

* removed redundant imports

* added new save_content check

* property name changevariables can not have '-'

* added new attribute  to ManagerParams

* adapted files to validate manager & broswer params

- also added logic to convert the objects(BrowserParams and ManagerParams) to dictionaries to not break the functionality
- also updated demo.py to work with new file names on this branch

* removed obsolete documentaion

* Dependency Update

* Revert "Dependency Update"

This reverts commit 8ee3a02b17.

* Dependencies Update

* unset memory and process watchdogs

* add new output_format and failure_limit checks

* inheriting dataclasses and added type hints to fn

* added todo

* fixed inheritance of dataclasses acc. to plan

* refactor use of dict to use dataclasses(pending)

* more refactoring use of dict to dataclasses -
Also changed some type hints related to new refactoring

* fixed screenshot directory issue -
because of which some of the tests were failing

* added try-except clause for unexpected errors

* added tests to cover dataclasses

* added some new and edited some old docs

* refactor use of __dict__ to dataclass.to_dict()

* Revert "refactor use of __dict__ to dataclass.to_dict()"

This reverts commit a4f35513fa.

* fixed some tests

* refactor use of __dict__ in favor of
dataclass.to_dict() method

* removed some TODOS

* fixed dataclases validation tests

* Update docs/Configuration.md

Co-authored-by: Stefan Zabka <zabkaste@informatik.hu-berlin.de>

* Update docs/Configuration.md

Co-authored-by: Stefan Zabka <zabkaste@informatik.hu-berlin.de>

* Update docs/Configuration.md

Co-authored-by: Stefan Zabka <zabkaste@informatik.hu-berlin.de>

* Update openwpm/config.py

Co-authored-by: Stefan Zabka <zabkaste@informatik.hu-berlin.de>

* Update openwpm/config.py

Co-authored-by: Stefan Zabka <zabkaste@informatik.hu-berlin.de>

* Update openwpm/task_manager.py

Co-authored-by: Stefan Zabka <zabkaste@informatik.hu-berlin.de>

* minor fixed wrt polishing the PR

* added new check and test for crawl configs

Co-authored-by: Stefan Zabka <zabkaste@informatik.hu-berlin.de>
2020-12-02 10:10:45 +01:00
Ankush Dua d0c3466250
Resolving imports to avoid errors (#811)
* fixing attribute error

* import fixes
2020-11-25 20:33:56 +01:00
Fukurou Makoto 051a3846cb
Module & Imports conformed to PEP8 (#806)
* Module & Imports conformed to PEP8

* Conformed tests to PEP8

* Conformed tests to PEP8 (2)

* Updated webdriver test for PEP8

* Updated test_timer for PEP8

* Deleting Workspace file

* renamed files to match PEP8

* Update docs/Using_OpenWPM.md

Co-authored-by: Stefan Zabka <zabkaste@informatik.hu-berlin.de>

* Changed serversocket to ServerSocket

Co-authored-by: Stefan Zabka <zabkaste@informatik.hu-berlin.de>
2020-11-24 17:34:04 +01:00
Ankush Dua 502cd830ad
Renaming automation module to openwpm (#793) 2020-11-14 16:06:51 +01:00
Ankush Dua 2a971d0323
Moved process_watchdog and memory_watchdog into manager_params(#787)
fixes #749
Co-authored-by: vringar <szabka@mozilla.com>
2020-11-13 15:58:29 +01:00
ankushduacodes 65337e0c19
Making memory_watchdog not run by default (#785)
Closes #778
2020-11-10 15:53:54 +01:00
vringar 0491834020 Merge branch 'black' into master 2020-09-14 11:17:40 +02:00
vringar 635088b698 Restored demo.py 2020-09-14 11:00:01 +02:00
vringar 871edf48d3 Added test for dns instrument 2020-09-11 16:53:57 +02:00
vringar 0258ae527a Added black 2020-09-11 15:14:09 +02:00
Tobias Urban 243c54b1d0 added DNS resolution to the arrow/parquet schema 2020-08-17 09:06:06 +02:00
Tobias Urban 22780267f6 updated the demo and reverted one local change in teh HTTP instrumentation 2020-08-05 11:36:28 +02:00
Tobias Urban dfa970f171 added DNS resolution support 2020-08-05 11:28:07 +02:00
Tobias Urban 1fe9deaa3d merge with latest upstream/master 2020-08-05 08:52:45 +02:00
Tobias Urban f29a271d9b resolving local conflicts 2020-08-04 16:40:11 +02:00
vringar 86ec0946e8 Made demo.py print the name of the visited site again 2020-05-11 14:47:25 +02:00
vringar be29cd5b70 Removed _cleanup_before_fail 2020-05-08 11:05:25 +02:00
Sarah Bird 404409433b Tweak demo.py 2020-05-07 18:27:52 -05:00
Sarah Bird 2f978c4cfb
Restore xvfb (#621)
* Make core changes reinstating Xvfb

* Latest requirements

Added pyvirtualdisplay but running pip-compile caused additional
upgrades.

* Default should not be headless

* Fix flake8

* Revert "Latest requirements"

This reverts commit 36989e963d.

* Manually add only pyvirtualdisplay

* Parametrize test_simple_commands for two display modes.

* flake8

* Rebalance tests

test_[a-d, d-e] and test_c both taking 5 minutes each can be combined.
Other tests hopefully taking ~10 min each.

* Update crawler.py and demo.py

* Add DISPLAY_MODE to sentry

* flake8

* Add extra info about display_modes
2020-05-05 13:43:21 -05:00
vringar d5d544f9a6 Porting cleanup and convinience forward 2020-04-20 21:28:13 +02:00
Kainaat Singh b4cc3d7609 Remove support for flash cookie saving 2020-04-03 16:34:19 +05:30
vringar daa9752030 Merge branch 'onSavedCallback' of github.com:mozilla/OpenWPM into onSavedCallback 2020-03-06 12:36:10 +01:00
Stefan Zabka 4e418b37b5 working callbacks 2020-02-28 17:39:39 +01:00
Stefan Zabka 10cbc5f957 Callstack capturing activated in demo.py 2020-01-17 16:40:13 +01:00
Stefan Zabka c3a240e789 removed unused range import and some empty lines 2019-11-20 14:56:46 +01:00
Stefan Zabka f0136dd78c removed all uses of six 2019-11-20 14:26:15 +01:00
Stefan Zabka 034b937b34 removed all __future__ 2019-11-20 10:54:57 +01:00
Steven Englehardt d88bdeaa76
Add closing parenthesis to comment 2019-09-26 06:22:48 -07:00
Sarah Bird 5e9b00145e Add a little more context around parallelization in demo.py 2019-09-17 08:23:09 -05:00
Fredrik Wollsén 560c6579a1 Avoid using "index='**'" in demo.py since it is a bit confusing 2019-08-13 12:03:42 +03:00
Victor Ng 62084b6995 More XVFB related cleanup
* Removed more display references in comments
* removed pyvirtualdisplay from requirements.txt
* removed flash support from demo.py
* dropped Makefile and docker-compose.yml
2019-08-08 13:04:18 -04:00
Fredrik Wollsén ec3411165c Abide by flake 2019-07-15 21:26:32 +03:00
Fredrik Wollsén beab5d318a Abide to mighty flake + bonus: clarify a comment in the demo and crawler scripts 2019-07-15 21:26:32 +03:00
englehardt e5b883364c Disable virtual display creation for MacOS 2019-07-03 08:25:32 -07:00
englehardt 451c3babf9 Allow stateful crawling, but provide a warning about profile loss. 2019-07-01 18:11:36 -07:00
Fredrik Wollsén fe11c53620 Enable navigation and js instruments for demo crawl 2019-06-27 16:44:54 +03:00
englehardt 01f2fba875 Remove dump_profile_cookies command.
This command is no longer necessary with the new instrumentation. It was
broken by the latest Firefox upgrade, so it no longer makes sense to
keep it around.
2019-06-12 08:07:55 -07:00
englehardt 78e6b3fb04 Fix isort failures 2019-04-16 10:47:17 -07:00
Nihanth Subramanya 16ee6b52f7 Update macos install script 2019-04-08 16:38:56 +02:00
Tobias Urban e37186d257 fixed missing logging output (Issue #258) 2019-02-10 16:02:54 +01:00
Tobias Urban ed90f64c5c fixed missing logging output (Issue #258) 2019-02-10 16:00:06 +01:00
englehardt e3cef0c65b Fixing isort issues, reclassifying six 2018-08-15 10:30:21 -04:00
Stephen Donner 99984e538e First big round of flake8 + isort fixes 2018-07-31 23:48:06 -07:00
englehardt 9dd05ecc83 PEP8 Fixes 2017-10-04 15:39:35 -04:00
Zack Weinberg 1c5d9356c0 Apply python-modernize + some hand tidy-ups.
This should get us 90% of the way to Python 3 support.
2017-03-09 11:00:54 -05:00
englehardt 5d86590149 Making extension-based HTTP instrumentation default and deprecating
proxy instrumentation.

The naming of sql tables and browser params have been updated to reflect
that the extension HTTP instrumentation is preferred to the proxy. A few other
notable changes:
(1) Extension HTTP instrumentation is preferred, but still off-by-default
(2) The proxy is now off-by-default and shouldn't be used.
(3) browser_params['save_javascript'] uses the extension, proxy-based
    javascript saving is controlled with browser_params['save_javascript_proxy']
(4) The "post processing pipeline" (which was only used to parse HTTP
    cookies) has been removed and the TaskManager::close API updated.
2016-12-05 12:32:55 -05:00