OpenWPM

Граф коммитов

Автор	SHA1	Сообщение	Дата
Stefan Zabka	b95e9d2975	Stop AssertionErrors crashing production crawls (#945 ) * Stop AssertionErrors crashing production crawls Fixes #166 * Wrote tests for propagating exceptions * Logs were too noisy * Test_crawl should run like a real crawl	2021-07-16 18:32:51 +02:00
Stefan Zabka	b29c3f4052	Data Aggregator Rewrite (#753 ) * First steps in the rewrite * Fixed import paths * One giant refactor * Fixing tests * Adding mypy * Removed mypy from pre-commit workflow * First draft on DataAggregator * Wrote a DataAggregator that starts and shuts down * Created tests and added more empty types * Got demo.py working * Created sql_provider * Cleaned up imports in TaskManager * Added async * Fixed minor bugs * First steps at porting arrow * Introduced TableName and different Task handling * Added more failing tests * First first completes others don't * It works * Started working on arrow_provider * Implemented ArrowProvider * Added logger fixture * Fixed test_storage_controller * Fixing OpenWPMTest.visit() * Moved test/storage_providers to test/storage * Fixing up tests * Moved automation to openwpm * Readded datadir to .gitignore * Ran repin.sh * Fixed formatting * Let's see if this works * Fixed imports * Got arrow_memory_provider working * Starting to rewrite tests * Setting up fixtures * Attempting to fix all the tests * Still fixing tests * Broken content saving * Added node * Fixed screenshot tests * Fixing more tests * Fixed tests * Implemented local_storage.py * Cleaned up flush_cache * Fixing more tests * Wrote test for LocalArrowProvider * Introduced tests for local_storage_provider.py * Asserting test dir is empty * Creating subfolder for different aggregators * New depencies and init() * Everything is terribly broken * Figured out finalize_visit_id * Running two event loops kinda works??? * Rearming the event * Introduced mypy * Downgraded black in pre-commit * Modifying the database directly * Fixed formatting * Made mypy a lil stricter * Fixing docs and config printing * Realising I've been using the wrong with * Trying to figure arrow_storage * Moving lock initialization in in_memory_storage * Fixing tests * Fixing up tests and adding more typechecking * Fixed num_browsers in test_cache_hits_recorded * Parametrized unstructured * String fix * Added failing test * New test * Review changes with Steven * Fixed repin.sh and test_arrow_cache * Minor change * Fixed prune-environment.py * Removing references to DataAggregator * Fixed test_seed_persistance * More paths * Fixed test display shutdown * Made cache test more robust * Update crawler.py Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Slimming down ManagerParams * Fixing more tests * Update test/storage/test_storage_controller.py Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Purging references to DataAggregator * Reverted changes to .travis.yml * Demo.py saves locally again * Readjusting test paths * Expanded comment on initialize to reference #846 * Made token optional in finalize_visit_id * Simplified test paramtetrization * Fixed callback semantics change * Removed test_parse_http_stack_trace_str * Added DataSocket * WIP need to fix path encoding * Fixed path encoding * Added task and crawl to schema * Fixed paths in GitHub actions * Refactored completion handling * Fix tests * Trying to fix tests on CI * Removed redundant setting of tag * Removing references to S3 * Purging more DataAggregator references * Craking up logging to figure out test failure * Moved test_values into a fixture * Fixing GcpUnstructuredProvider * Fixed paths for future crawls * Renamed sqllite to official sqlite * Restored demo.py * Update openwpm/commands/profile_commands.py Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Restored previous behaviour of DumpProfileCommand Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Removed leftovers * Cleaned up comments * Expanded lock check * Fixed more stuff * More comment updates * Update openwpm/socket_interface.py Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Removed outdated comment * Using config_encoder * Renamed tar_location to tar_path * Removed references to database_name in docs * Cleanup * Moved screenshot_path and source_dump_path to ManagerParamsInternal * Fixed imports * Fixing up comments * Fixing up comments * More docs * updated dependencies * Fixed test_task_manager * Reupgraded to python 3.9.1 * Restoring crawl_reference in mp_logger * Removed unused imports * Apply suggestions from code review Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Cleaned up socket handling * Fixed TaskManager.__exit__ * Moved validation code into config.py * Removed comment * Removed comment * Removed comment Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com>	2021-02-22 17:51:32 +01:00
Georgia Kokkinou	f9e38a396b	Fix behavior of failure_limit (#854 ) * Set failure_limit default in config.py Set the default value for failure_limit in config.py instead of setting it in the TaskManager. Also, accept 0 as a valid value for failure_limit. * Rename failurecount to failure_count * Clear failure_count on successful command sequence Reset failure_count to 0 only at the end of each successfully completed command sequence. Before it was reset after each successful command. This would result in failures of subsequent commands that belonged to different command sequences not triggering a CommandExecutionError because failure_count was reset upon every InitializeCommand. Also, update the docs to reflect the current behavior of failure_limit. Closes #851 * Enable test_crash * Fix some minor typos * Test failure limit behavior Move `test_crash` out of test_profile.py, as it does not depend on profile saving support and rename it to `test_failure_limit_exceeded`, which is more descriptive. Also, add two more tests to cover more aspects of failure limit behavior. * Use local test server in TaskManager tests * Clarify failure_limit behavior in docs Mention that the CommandExecutionError gets raised by the next command sequence after failure_limit has been exceeded. * Add type annotations for failure_limit property	2021-02-15 15:59:57 +01:00

Автор

SHA1

Сообщение

Дата

Stefan Zabka

b95e9d2975

Stop AssertionErrors crashing production crawls (#945 )

* Stop AssertionErrors crashing production crawls

Fixes #166

* Wrote tests for propagating exceptions

* Logs were too noisy

* Test_crawl should run like a real crawl

2021-07-16 18:32:51 +02:00

Stefan Zabka

b29c3f4052

Data Aggregator Rewrite (#753 )

* First steps in the rewrite

* Fixed import paths

* One giant refactor

* Fixing tests

* Adding mypy

* Removed mypy from pre-commit workflow

* First draft on DataAggregator

* Wrote a DataAggregator that starts and shuts down

* Created tests and added more empty types

* Got demo.py working

* Created sql_provider

* Cleaned up imports in TaskManager

* Added async

* Fixed minor bugs

* First steps at porting arrow

* Introduced TableName and different Task handling

* Added more failing tests

* First first completes others don't

* It works

* Started working on arrow_provider

* Implemented ArrowProvider

* Added logger fixture

* Fixed test_storage_controller

* Fixing OpenWPMTest.visit()

* Moved test/storage_providers to test/storage

* Fixing up tests

* Moved automation to openwpm

* Readded datadir to .gitignore

* Ran repin.sh

* Fixed formatting

* Let's see if this works

* Fixed imports

* Got arrow_memory_provider working

* Starting to rewrite tests

* Setting up fixtures

* Attempting to fix all the tests

* Still fixing tests

* Broken content saving

* Added node

* Fixed screenshot tests

* Fixing more tests

* Fixed tests

* Implemented local_storage.py

* Cleaned up flush_cache

* Fixing more tests

* Wrote test for LocalArrowProvider

* Introduced tests for local_storage_provider.py

* Asserting test dir is empty

* Creating subfolder for different aggregators

* New depencies and init()

* Everything is terribly broken

* Figured out finalize_visit_id

* Running two event loops kinda works???

* Rearming the event

* Introduced mypy

* Downgraded black in pre-commit

* Modifying the database directly

* Fixed formatting

* Made mypy a lil stricter

* Fixing docs and config printing

* Realising I've been using the wrong with

* Trying to figure arrow_storage

* Moving lock initialization in in_memory_storage

* Fixing tests

* Fixing up tests and adding more typechecking

* Fixed num_browsers in test_cache_hits_recorded

* Parametrized unstructured

* String fix

* Added failing test

* New test

* Review changes with Steven

* Fixed repin.sh and test_arrow_cache

* Minor change

* Fixed prune-environment.py

* Removing references to DataAggregator

* Fixed test_seed_persistance

* More paths

* Fixed test display shutdown

* Made cache test more robust

* Update crawler.py

Co-authored-by: Steven Englehardt <senglehardt@mozilla.com>

* Slimming down ManagerParams

* Fixing more tests

* Update test/storage/test_storage_controller.py

Co-authored-by: Steven Englehardt <senglehardt@mozilla.com>

* Purging references to DataAggregator

* Reverted changes to .travis.yml

* Demo.py saves locally again

* Readjusting test paths

* Expanded comment on initialize to reference #846

* Made token optional in finalize_visit_id

* Simplified test paramtetrization

* Fixed callback semantics change

* Removed test_parse_http_stack_trace_str

* Added DataSocket

* WIP need to fix path encoding

* Fixed path encoding

* Added task and crawl to schema

* Fixed paths in GitHub actions

* Refactored completion handling

* Fix tests

* Trying to fix tests on CI

* Removed redundant setting of tag

* Removing references to S3

* Purging more DataAggregator references

* Craking up logging to figure out test failure

* Moved test_values into a fixture

* Fixing GcpUnstructuredProvider

* Fixed paths for future crawls

* Renamed sqllite to official sqlite

* Restored demo.py

* Update openwpm/commands/profile_commands.py

Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com>

* Restored previous behaviour of DumpProfileCommand

Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com>

* Removed leftovers

* Cleaned up comments

* Expanded lock check

* Fixed more stuff

* More comment updates

* Update openwpm/socket_interface.py

Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com>

* Removed outdated comment

* Using config_encoder

* Renamed tar_location to tar_path

* Removed references to database_name in docs

* Cleanup

* Moved screenshot_path and source_dump_path to ManagerParamsInternal

* Fixed imports

* Fixing up comments

* Fixing up comments

* More docs

* updated dependencies

* Fixed test_task_manager

* Reupgraded to python 3.9.1

* Restoring crawl_reference in mp_logger

* Removed unused imports

* Apply suggestions from code review

Co-authored-by: Steven Englehardt <senglehardt@mozilla.com>

* Cleaned up socket handling

* Fixed TaskManager.__exit__

* Moved validation code into config.py

* Removed comment

* Removed comment

* Removed comment

Co-authored-by: Steven Englehardt <senglehardt@mozilla.com>
Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com>

2021-02-22 17:51:32 +01:00

Georgia Kokkinou

f9e38a396b

Fix behavior of failure_limit (#854 )

* Set failure_limit default in config.py

Set the default value for failure_limit in config.py instead of setting
it in the TaskManager. Also, accept 0 as a valid value for
failure_limit.

* Rename failurecount to failure_count

* Clear failure_count on successful command sequence

Reset failure_count to 0 only at the end of each successfully completed
command sequence. Before it was reset after each successful command.
This would result in failures of subsequent commands that belonged to
different command sequences not triggering a CommandExecutionError
because failure_count was reset upon every InitializeCommand. Also,
update the docs to reflect the current behavior of failure_limit.

Closes #851

* Enable test_crash

* Fix some minor typos

* Test failure limit behavior

Move `test_crash` out of test_profile.py, as it does not depend on
profile saving support and rename it to `test_failure_limit_exceeded`,
which is more descriptive. Also, add two more tests to cover more
aspects of failure limit behavior.

* Use local test server in TaskManager tests

* Clarify failure_limit behavior in docs

Mention that the CommandExecutionError gets raised by the next command
sequence after failure_limit has been exceeded.

* Add type annotations for failure_limit property

2021-02-15 15:59:57 +01:00

3 Коммитов