OpenWPM/test/test_custom_function_comman...

109 строки
3.2 KiB
Python
Исходник Обычный вид История

Data Aggregator Rewrite (#753) * First steps in the rewrite * Fixed import paths * One giant refactor * Fixing tests * Adding mypy * Removed mypy from pre-commit workflow * First draft on DataAggregator * Wrote a DataAggregator that starts and shuts down * Created tests and added more empty types * Got demo.py working * Created sql_provider * Cleaned up imports in TaskManager * Added async * Fixed minor bugs * First steps at porting arrow * Introduced TableName and different Task handling * Added more failing tests * First first completes others don't * It works * Started working on arrow_provider * Implemented ArrowProvider * Added logger fixture * Fixed test_storage_controller * Fixing OpenWPMTest.visit() * Moved test/storage_providers to test/storage * Fixing up tests * Moved automation to openwpm * Readded datadir to .gitignore * Ran repin.sh * Fixed formatting * Let's see if this works * Fixed imports * Got arrow_memory_provider working * Starting to rewrite tests * Setting up fixtures * Attempting to fix all the tests * Still fixing tests * Broken content saving * Added node * Fixed screenshot tests * Fixing more tests * Fixed tests * Implemented local_storage.py * Cleaned up flush_cache * Fixing more tests * Wrote test for LocalArrowProvider * Introduced tests for local_storage_provider.py * Asserting test dir is empty * Creating subfolder for different aggregators * New depencies and init() * Everything is terribly broken * Figured out finalize_visit_id * Running two event loops kinda works??? * Rearming the event * Introduced mypy * Downgraded black in pre-commit * Modifying the database directly * Fixed formatting * Made mypy a lil stricter * Fixing docs and config printing * Realising I've been using the wrong with * Trying to figure arrow_storage * Moving lock initialization in in_memory_storage * Fixing tests * Fixing up tests and adding more typechecking * Fixed num_browsers in test_cache_hits_recorded * Parametrized unstructured * String fix * Added failing test * New test * Review changes with Steven * Fixed repin.sh and test_arrow_cache * Minor change * Fixed prune-environment.py * Removing references to DataAggregator * Fixed test_seed_persistance * More paths * Fixed test display shutdown * Made cache test more robust * Update crawler.py Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Slimming down ManagerParams * Fixing more tests * Update test/storage/test_storage_controller.py Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Purging references to DataAggregator * Reverted changes to .travis.yml * Demo.py saves locally again * Readjusting test paths * Expanded comment on initialize to reference #846 * Made token optional in finalize_visit_id * Simplified test paramtetrization * Fixed callback semantics change * Removed test_parse_http_stack_trace_str * Added DataSocket * WIP need to fix path encoding * Fixed path encoding * Added task and crawl to schema * Fixed paths in GitHub actions * Refactored completion handling * Fix tests * Trying to fix tests on CI * Removed redundant setting of tag * Removing references to S3 * Purging more DataAggregator references * Craking up logging to figure out test failure * Moved test_values into a fixture * Fixing GcpUnstructuredProvider * Fixed paths for future crawls * Renamed sqllite to official sqlite * Restored demo.py * Update openwpm/commands/profile_commands.py Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Restored previous behaviour of DumpProfileCommand Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Removed leftovers * Cleaned up comments * Expanded lock check * Fixed more stuff * More comment updates * Update openwpm/socket_interface.py Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Removed outdated comment * Using config_encoder * Renamed tar_location to tar_path * Removed references to database_name in docs * Cleanup * Moved screenshot_path and source_dump_path to ManagerParamsInternal * Fixed imports * Fixing up comments * Fixing up comments * More docs * updated dependencies * Fixed test_task_manager * Reupgraded to python 3.9.1 * Restoring crawl_reference in mp_logger * Removed unused imports * Apply suggestions from code review Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Cleaned up socket handling * Fixed TaskManager.__exit__ * Moved validation code into config.py * Removed comment * Removed comment * Removed comment Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com>
2021-02-22 19:51:32 +03:00
import sqlite3
Command refactoring (#750) * Refactored GetCommand, BrowseCommand to have execute method * Fixed type name format issues in __issue_command * Fixed everything I broke * Changed import style so tests can run * Added BrowseCommad to imports * Added some more self * Added logging to explain failing test * Added one more self * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * Ported SaveScreenshotCommand It now uses the new command.execute(...) syntax * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * Ported SaveScreenshotFullPage #763 * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * Ported DumpPageSource and RecursiveDumpPageSource (#767) * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Command refactoring (#770) * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Ran isort * Added append_command * remove custom function command and format code * Refactored GetCommand, BrowseCommand to have execute method * Fixed type name format issues in __issue_command * Fixed everything I broke * Changed import style so tests can run * Added BrowseCommad to imports * Added some more self * Added logging to explain failing test * Added one more self * Ported SaveScreenshotCommand It now uses the new command.execute(...) syntax * Ported SaveScreenshotFullPage #763 * Ported DumpPageSource and RecursiveDumpPageSource (#767) * Command refactoring (#770) * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Ran isort * Added append_command * remove duplicate append_command * Refactored GetCommand, BrowseCommand to have execute method * Fixed type name format issues in __issue_command * Fixed everything I broke * Changed import style so tests can run * Added BrowseCommad to imports * Added some more self * Added logging to explain failing test * Added one more self * Ported SaveScreenshotCommand It now uses the new command.execute(...) syntax * Ported SaveScreenshotFullPage #763 * Ported DumpPageSource and RecursiveDumpPageSource (#767) * Command refactoring (#770) * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Ran isort * Added append_command * generate new xpi * Fixing tests * Fixing tests * Fixing up more tests * Removed type annotations * Fixing tests * Fixing tests * Removed command_executor * Moved Commands to commands * Fixing imports * Fixed skipped test * Removed duplicate append_command * docs: update adding command in usingOpenWPM * Forgot to save * Removed datadir * Cleaning up imports * Implemented simple command * Added documentation to simple_command.py * Renamed to custom_command.py * Moved docs around * Referencing BaseCommand.execute * Update docs/Using_OpenWPM.md Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> Co-authored-by: Cyrus <cyruskarsan@gmail.com> Co-authored-by: cyruskarsan <55566678+cyruskarsan@users.noreply.github.com> Co-authored-by: Steven Englehardt <senglehardt@mozilla.com>
2021-01-09 13:15:01 +03:00
from selenium.webdriver import Firefox
Data Aggregator Rewrite (#753) * First steps in the rewrite * Fixed import paths * One giant refactor * Fixing tests * Adding mypy * Removed mypy from pre-commit workflow * First draft on DataAggregator * Wrote a DataAggregator that starts and shuts down * Created tests and added more empty types * Got demo.py working * Created sql_provider * Cleaned up imports in TaskManager * Added async * Fixed minor bugs * First steps at porting arrow * Introduced TableName and different Task handling * Added more failing tests * First first completes others don't * It works * Started working on arrow_provider * Implemented ArrowProvider * Added logger fixture * Fixed test_storage_controller * Fixing OpenWPMTest.visit() * Moved test/storage_providers to test/storage * Fixing up tests * Moved automation to openwpm * Readded datadir to .gitignore * Ran repin.sh * Fixed formatting * Let's see if this works * Fixed imports * Got arrow_memory_provider working * Starting to rewrite tests * Setting up fixtures * Attempting to fix all the tests * Still fixing tests * Broken content saving * Added node * Fixed screenshot tests * Fixing more tests * Fixed tests * Implemented local_storage.py * Cleaned up flush_cache * Fixing more tests * Wrote test for LocalArrowProvider * Introduced tests for local_storage_provider.py * Asserting test dir is empty * Creating subfolder for different aggregators * New depencies and init() * Everything is terribly broken * Figured out finalize_visit_id * Running two event loops kinda works??? * Rearming the event * Introduced mypy * Downgraded black in pre-commit * Modifying the database directly * Fixed formatting * Made mypy a lil stricter * Fixing docs and config printing * Realising I've been using the wrong with * Trying to figure arrow_storage * Moving lock initialization in in_memory_storage * Fixing tests * Fixing up tests and adding more typechecking * Fixed num_browsers in test_cache_hits_recorded * Parametrized unstructured * String fix * Added failing test * New test * Review changes with Steven * Fixed repin.sh and test_arrow_cache * Minor change * Fixed prune-environment.py * Removing references to DataAggregator * Fixed test_seed_persistance * More paths * Fixed test display shutdown * Made cache test more robust * Update crawler.py Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Slimming down ManagerParams * Fixing more tests * Update test/storage/test_storage_controller.py Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Purging references to DataAggregator * Reverted changes to .travis.yml * Demo.py saves locally again * Readjusting test paths * Expanded comment on initialize to reference #846 * Made token optional in finalize_visit_id * Simplified test paramtetrization * Fixed callback semantics change * Removed test_parse_http_stack_trace_str * Added DataSocket * WIP need to fix path encoding * Fixed path encoding * Added task and crawl to schema * Fixed paths in GitHub actions * Refactored completion handling * Fix tests * Trying to fix tests on CI * Removed redundant setting of tag * Removing references to S3 * Purging more DataAggregator references * Craking up logging to figure out test failure * Moved test_values into a fixture * Fixing GcpUnstructuredProvider * Fixed paths for future crawls * Renamed sqllite to official sqlite * Restored demo.py * Update openwpm/commands/profile_commands.py Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Restored previous behaviour of DumpProfileCommand Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Removed leftovers * Cleaned up comments * Expanded lock check * Fixed more stuff * More comment updates * Update openwpm/socket_interface.py Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Removed outdated comment * Using config_encoder * Renamed tar_location to tar_path * Removed references to database_name in docs * Cleanup * Moved screenshot_path and source_dump_path to ManagerParamsInternal * Fixed imports * Fixing up comments * Fixing up comments * More docs * updated dependencies * Fixed test_task_manager * Reupgraded to python 3.9.1 * Restoring crawl_reference in mp_logger * Removed unused imports * Apply suggestions from code review Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Cleaned up socket handling * Fixed TaskManager.__exit__ * Moved validation code into config.py * Removed comment * Removed comment * Removed comment Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com>
2021-02-22 19:51:32 +03:00
from openwpm import command_sequence
Command refactoring (#750) * Refactored GetCommand, BrowseCommand to have execute method * Fixed type name format issues in __issue_command * Fixed everything I broke * Changed import style so tests can run * Added BrowseCommad to imports * Added some more self * Added logging to explain failing test * Added one more self * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * Ported SaveScreenshotCommand It now uses the new command.execute(...) syntax * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * Ported SaveScreenshotFullPage #763 * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * Ported DumpPageSource and RecursiveDumpPageSource (#767) * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Command refactoring (#770) * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Ran isort * Added append_command * remove custom function command and format code * Refactored GetCommand, BrowseCommand to have execute method * Fixed type name format issues in __issue_command * Fixed everything I broke * Changed import style so tests can run * Added BrowseCommad to imports * Added some more self * Added logging to explain failing test * Added one more self * Ported SaveScreenshotCommand It now uses the new command.execute(...) syntax * Ported SaveScreenshotFullPage #763 * Ported DumpPageSource and RecursiveDumpPageSource (#767) * Command refactoring (#770) * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Ran isort * Added append_command * remove duplicate append_command * Refactored GetCommand, BrowseCommand to have execute method * Fixed type name format issues in __issue_command * Fixed everything I broke * Changed import style so tests can run * Added BrowseCommad to imports * Added some more self * Added logging to explain failing test * Added one more self * Ported SaveScreenshotCommand It now uses the new command.execute(...) syntax * Ported SaveScreenshotFullPage #763 * Ported DumpPageSource and RecursiveDumpPageSource (#767) * Command refactoring (#770) * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Ran isort * Added append_command * generate new xpi * Fixing tests * Fixing tests * Fixing up more tests * Removed type annotations * Fixing tests * Fixing tests * Removed command_executor * Moved Commands to commands * Fixing imports * Fixed skipped test * Removed duplicate append_command * docs: update adding command in usingOpenWPM * Forgot to save * Removed datadir * Cleaning up imports * Implemented simple command * Added documentation to simple_command.py * Renamed to custom_command.py * Moved docs around * Referencing BaseCommand.execute * Update docs/Using_OpenWPM.md Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> Co-authored-by: Cyrus <cyruskarsan@gmail.com> Co-authored-by: cyruskarsan <55566678+cyruskarsan@users.noreply.github.com> Co-authored-by: Steven Englehardt <senglehardt@mozilla.com>
2021-01-09 13:15:01 +03:00
from openwpm.commands.types import BaseCommand
Data Aggregator Rewrite (#753) * First steps in the rewrite * Fixed import paths * One giant refactor * Fixing tests * Adding mypy * Removed mypy from pre-commit workflow * First draft on DataAggregator * Wrote a DataAggregator that starts and shuts down * Created tests and added more empty types * Got demo.py working * Created sql_provider * Cleaned up imports in TaskManager * Added async * Fixed minor bugs * First steps at porting arrow * Introduced TableName and different Task handling * Added more failing tests * First first completes others don't * It works * Started working on arrow_provider * Implemented ArrowProvider * Added logger fixture * Fixed test_storage_controller * Fixing OpenWPMTest.visit() * Moved test/storage_providers to test/storage * Fixing up tests * Moved automation to openwpm * Readded datadir to .gitignore * Ran repin.sh * Fixed formatting * Let's see if this works * Fixed imports * Got arrow_memory_provider working * Starting to rewrite tests * Setting up fixtures * Attempting to fix all the tests * Still fixing tests * Broken content saving * Added node * Fixed screenshot tests * Fixing more tests * Fixed tests * Implemented local_storage.py * Cleaned up flush_cache * Fixing more tests * Wrote test for LocalArrowProvider * Introduced tests for local_storage_provider.py * Asserting test dir is empty * Creating subfolder for different aggregators * New depencies and init() * Everything is terribly broken * Figured out finalize_visit_id * Running two event loops kinda works??? * Rearming the event * Introduced mypy * Downgraded black in pre-commit * Modifying the database directly * Fixed formatting * Made mypy a lil stricter * Fixing docs and config printing * Realising I've been using the wrong with * Trying to figure arrow_storage * Moving lock initialization in in_memory_storage * Fixing tests * Fixing up tests and adding more typechecking * Fixed num_browsers in test_cache_hits_recorded * Parametrized unstructured * String fix * Added failing test * New test * Review changes with Steven * Fixed repin.sh and test_arrow_cache * Minor change * Fixed prune-environment.py * Removing references to DataAggregator * Fixed test_seed_persistance * More paths * Fixed test display shutdown * Made cache test more robust * Update crawler.py Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Slimming down ManagerParams * Fixing more tests * Update test/storage/test_storage_controller.py Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Purging references to DataAggregator * Reverted changes to .travis.yml * Demo.py saves locally again * Readjusting test paths * Expanded comment on initialize to reference #846 * Made token optional in finalize_visit_id * Simplified test paramtetrization * Fixed callback semantics change * Removed test_parse_http_stack_trace_str * Added DataSocket * WIP need to fix path encoding * Fixed path encoding * Added task and crawl to schema * Fixed paths in GitHub actions * Refactored completion handling * Fix tests * Trying to fix tests on CI * Removed redundant setting of tag * Removing references to S3 * Purging more DataAggregator references * Craking up logging to figure out test failure * Moved test_values into a fixture * Fixing GcpUnstructuredProvider * Fixed paths for future crawls * Renamed sqllite to official sqlite * Restored demo.py * Update openwpm/commands/profile_commands.py Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Restored previous behaviour of DumpProfileCommand Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Removed leftovers * Cleaned up comments * Expanded lock check * Fixed more stuff * More comment updates * Update openwpm/socket_interface.py Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Removed outdated comment * Using config_encoder * Renamed tar_location to tar_path * Removed references to database_name in docs * Cleanup * Moved screenshot_path and source_dump_path to ManagerParamsInternal * Fixed imports * Fixing up comments * Fixing up comments * More docs * updated dependencies * Fixed test_task_manager * Reupgraded to python 3.9.1 * Restoring crawl_reference in mp_logger * Removed unused imports * Apply suggestions from code review Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Cleaned up socket handling * Fixed TaskManager.__exit__ * Moved validation code into config.py * Removed comment * Removed comment * Removed comment Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com>
2021-02-22 19:51:32 +03:00
from openwpm.config import BrowserParams, ManagerParamsInternal
Command refactoring (#750) * Refactored GetCommand, BrowseCommand to have execute method * Fixed type name format issues in __issue_command * Fixed everything I broke * Changed import style so tests can run * Added BrowseCommad to imports * Added some more self * Added logging to explain failing test * Added one more self * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * Ported SaveScreenshotCommand It now uses the new command.execute(...) syntax * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * Ported SaveScreenshotFullPage #763 * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * Ported DumpPageSource and RecursiveDumpPageSource (#767) * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Command refactoring (#770) * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Ran isort * Added append_command * remove custom function command and format code * Refactored GetCommand, BrowseCommand to have execute method * Fixed type name format issues in __issue_command * Fixed everything I broke * Changed import style so tests can run * Added BrowseCommad to imports * Added some more self * Added logging to explain failing test * Added one more self * Ported SaveScreenshotCommand It now uses the new command.execute(...) syntax * Ported SaveScreenshotFullPage #763 * Ported DumpPageSource and RecursiveDumpPageSource (#767) * Command refactoring (#770) * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Ran isort * Added append_command * remove duplicate append_command * Refactored GetCommand, BrowseCommand to have execute method * Fixed type name format issues in __issue_command * Fixed everything I broke * Changed import style so tests can run * Added BrowseCommad to imports * Added some more self * Added logging to explain failing test * Added one more self * Ported SaveScreenshotCommand It now uses the new command.execute(...) syntax * Ported SaveScreenshotFullPage #763 * Ported DumpPageSource and RecursiveDumpPageSource (#767) * Command refactoring (#770) * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Ran isort * Added append_command * generate new xpi * Fixing tests * Fixing tests * Fixing up more tests * Removed type annotations * Fixing tests * Fixing tests * Removed command_executor * Moved Commands to commands * Fixing imports * Fixed skipped test * Removed duplicate append_command * docs: update adding command in usingOpenWPM * Forgot to save * Removed datadir * Cleaning up imports * Implemented simple command * Added documentation to simple_command.py * Renamed to custom_command.py * Moved docs around * Referencing BaseCommand.execute * Update docs/Using_OpenWPM.md Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> Co-authored-by: Cyrus <cyruskarsan@gmail.com> Co-authored-by: cyruskarsan <55566678+cyruskarsan@users.noreply.github.com> Co-authored-by: Steven Englehardt <senglehardt@mozilla.com>
2021-01-09 13:15:01 +03:00
from openwpm.socket_interface import ClientSocket
Data Aggregator Rewrite (#753) * First steps in the rewrite * Fixed import paths * One giant refactor * Fixing tests * Adding mypy * Removed mypy from pre-commit workflow * First draft on DataAggregator * Wrote a DataAggregator that starts and shuts down * Created tests and added more empty types * Got demo.py working * Created sql_provider * Cleaned up imports in TaskManager * Added async * Fixed minor bugs * First steps at porting arrow * Introduced TableName and different Task handling * Added more failing tests * First first completes others don't * It works * Started working on arrow_provider * Implemented ArrowProvider * Added logger fixture * Fixed test_storage_controller * Fixing OpenWPMTest.visit() * Moved test/storage_providers to test/storage * Fixing up tests * Moved automation to openwpm * Readded datadir to .gitignore * Ran repin.sh * Fixed formatting * Let's see if this works * Fixed imports * Got arrow_memory_provider working * Starting to rewrite tests * Setting up fixtures * Attempting to fix all the tests * Still fixing tests * Broken content saving * Added node * Fixed screenshot tests * Fixing more tests * Fixed tests * Implemented local_storage.py * Cleaned up flush_cache * Fixing more tests * Wrote test for LocalArrowProvider * Introduced tests for local_storage_provider.py * Asserting test dir is empty * Creating subfolder for different aggregators * New depencies and init() * Everything is terribly broken * Figured out finalize_visit_id * Running two event loops kinda works??? * Rearming the event * Introduced mypy * Downgraded black in pre-commit * Modifying the database directly * Fixed formatting * Made mypy a lil stricter * Fixing docs and config printing * Realising I've been using the wrong with * Trying to figure arrow_storage * Moving lock initialization in in_memory_storage * Fixing tests * Fixing up tests and adding more typechecking * Fixed num_browsers in test_cache_hits_recorded * Parametrized unstructured * String fix * Added failing test * New test * Review changes with Steven * Fixed repin.sh and test_arrow_cache * Minor change * Fixed prune-environment.py * Removing references to DataAggregator * Fixed test_seed_persistance * More paths * Fixed test display shutdown * Made cache test more robust * Update crawler.py Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Slimming down ManagerParams * Fixing more tests * Update test/storage/test_storage_controller.py Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Purging references to DataAggregator * Reverted changes to .travis.yml * Demo.py saves locally again * Readjusting test paths * Expanded comment on initialize to reference #846 * Made token optional in finalize_visit_id * Simplified test paramtetrization * Fixed callback semantics change * Removed test_parse_http_stack_trace_str * Added DataSocket * WIP need to fix path encoding * Fixed path encoding * Added task and crawl to schema * Fixed paths in GitHub actions * Refactored completion handling * Fix tests * Trying to fix tests on CI * Removed redundant setting of tag * Removing references to S3 * Purging more DataAggregator references * Craking up logging to figure out test failure * Moved test_values into a fixture * Fixing GcpUnstructuredProvider * Fixed paths for future crawls * Renamed sqllite to official sqlite * Restored demo.py * Update openwpm/commands/profile_commands.py Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Restored previous behaviour of DumpProfileCommand Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Removed leftovers * Cleaned up comments * Expanded lock check * Fixed more stuff * More comment updates * Update openwpm/socket_interface.py Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Removed outdated comment * Using config_encoder * Renamed tar_location to tar_path * Removed references to database_name in docs * Cleanup * Moved screenshot_path and source_dump_path to ManagerParamsInternal * Fixed imports * Fixing up comments * Fixing up comments * More docs * updated dependencies * Fixed test_task_manager * Reupgraded to python 3.9.1 * Restoring crawl_reference in mp_logger * Removed unused imports * Apply suggestions from code review Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Cleaned up socket handling * Fixed TaskManager.__exit__ * Moved validation code into config.py * Removed comment * Removed comment * Removed comment Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com>
2021-02-22 19:51:32 +03:00
from openwpm.storage.sql_provider import SQLiteStorageProvider
from openwpm.storage.storage_providers import TableName
from openwpm.task_manager import TaskManager
from openwpm.utilities import db_utils
2019-04-26 20:07:40 +03:00
from . import utilities
2020-09-11 16:14:09 +03:00
url_a = utilities.BASE_TEST_URL + "/simple_a.html"
PAGE_LINKS = {
2020-09-11 16:14:09 +03:00
(
f"{utilities.BASE_TEST_URL}/simple_a.html",
f"{utilities.BASE_TEST_URL}/simple_c.html",
2020-09-11 16:14:09 +03:00
),
(
f"{utilities.BASE_TEST_URL}/simple_a.html",
f"{utilities.BASE_TEST_URL}/simple_d.html",
2020-09-11 16:14:09 +03:00
),
(
f"{utilities.BASE_TEST_URL}/simple_a.html",
"http://example.com/test.html?localhost",
2020-09-11 16:14:09 +03:00
),
}
Command refactoring (#750) * Refactored GetCommand, BrowseCommand to have execute method * Fixed type name format issues in __issue_command * Fixed everything I broke * Changed import style so tests can run * Added BrowseCommad to imports * Added some more self * Added logging to explain failing test * Added one more self * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * Ported SaveScreenshotCommand It now uses the new command.execute(...) syntax * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * Ported SaveScreenshotFullPage #763 * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * Ported DumpPageSource and RecursiveDumpPageSource (#767) * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Command refactoring (#770) * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Ran isort * Added append_command * remove custom function command and format code * Refactored GetCommand, BrowseCommand to have execute method * Fixed type name format issues in __issue_command * Fixed everything I broke * Changed import style so tests can run * Added BrowseCommad to imports * Added some more self * Added logging to explain failing test * Added one more self * Ported SaveScreenshotCommand It now uses the new command.execute(...) syntax * Ported SaveScreenshotFullPage #763 * Ported DumpPageSource and RecursiveDumpPageSource (#767) * Command refactoring (#770) * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Ran isort * Added append_command * remove duplicate append_command * Refactored GetCommand, BrowseCommand to have execute method * Fixed type name format issues in __issue_command * Fixed everything I broke * Changed import style so tests can run * Added BrowseCommad to imports * Added some more self * Added logging to explain failing test * Added one more self * Ported SaveScreenshotCommand It now uses the new command.execute(...) syntax * Ported SaveScreenshotFullPage #763 * Ported DumpPageSource and RecursiveDumpPageSource (#767) * Command refactoring (#770) * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Ran isort * Added append_command * generate new xpi * Fixing tests * Fixing tests * Fixing up more tests * Removed type annotations * Fixing tests * Fixing tests * Removed command_executor * Moved Commands to commands * Fixing imports * Fixed skipped test * Removed duplicate append_command * docs: update adding command in usingOpenWPM * Forgot to save * Removed datadir * Cleaning up imports * Implemented simple command * Added documentation to simple_command.py * Renamed to custom_command.py * Moved docs around * Referencing BaseCommand.execute * Update docs/Using_OpenWPM.md Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> Co-authored-by: Cyrus <cyruskarsan@gmail.com> Co-authored-by: cyruskarsan <55566678+cyruskarsan@users.noreply.github.com> Co-authored-by: Steven Englehardt <senglehardt@mozilla.com>
2021-01-09 13:15:01 +03:00
class CollectLinksCommand(BaseCommand):
"""Collect links with `scheme` and save in table `table_name`"""
Command refactoring (#750) * Refactored GetCommand, BrowseCommand to have execute method * Fixed type name format issues in __issue_command * Fixed everything I broke * Changed import style so tests can run * Added BrowseCommad to imports * Added some more self * Added logging to explain failing test * Added one more self * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * Ported SaveScreenshotCommand It now uses the new command.execute(...) syntax * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * Ported SaveScreenshotFullPage #763 * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * Ported DumpPageSource and RecursiveDumpPageSource (#767) * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Command refactoring (#770) * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Ran isort * Added append_command * remove custom function command and format code * Refactored GetCommand, BrowseCommand to have execute method * Fixed type name format issues in __issue_command * Fixed everything I broke * Changed import style so tests can run * Added BrowseCommad to imports * Added some more self * Added logging to explain failing test * Added one more self * Ported SaveScreenshotCommand It now uses the new command.execute(...) syntax * Ported SaveScreenshotFullPage #763 * Ported DumpPageSource and RecursiveDumpPageSource (#767) * Command refactoring (#770) * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Ran isort * Added append_command * remove duplicate append_command * Refactored GetCommand, BrowseCommand to have execute method * Fixed type name format issues in __issue_command * Fixed everything I broke * Changed import style so tests can run * Added BrowseCommad to imports * Added some more self * Added logging to explain failing test * Added one more self * Ported SaveScreenshotCommand It now uses the new command.execute(...) syntax * Ported SaveScreenshotFullPage #763 * Ported DumpPageSource and RecursiveDumpPageSource (#767) * Command refactoring (#770) * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Ran isort * Added append_command * generate new xpi * Fixing tests * Fixing tests * Fixing up more tests * Removed type annotations * Fixing tests * Fixing tests * Removed command_executor * Moved Commands to commands * Fixing imports * Fixed skipped test * Removed duplicate append_command * docs: update adding command in usingOpenWPM * Forgot to save * Removed datadir * Cleaning up imports * Implemented simple command * Added documentation to simple_command.py * Renamed to custom_command.py * Moved docs around * Referencing BaseCommand.execute * Update docs/Using_OpenWPM.md Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> Co-authored-by: Cyrus <cyruskarsan@gmail.com> Co-authored-by: cyruskarsan <55566678+cyruskarsan@users.noreply.github.com> Co-authored-by: Steven Englehardt <senglehardt@mozilla.com>
2021-01-09 13:15:01 +03:00
Data Aggregator Rewrite (#753) * First steps in the rewrite * Fixed import paths * One giant refactor * Fixing tests * Adding mypy * Removed mypy from pre-commit workflow * First draft on DataAggregator * Wrote a DataAggregator that starts and shuts down * Created tests and added more empty types * Got demo.py working * Created sql_provider * Cleaned up imports in TaskManager * Added async * Fixed minor bugs * First steps at porting arrow * Introduced TableName and different Task handling * Added more failing tests * First first completes others don't * It works * Started working on arrow_provider * Implemented ArrowProvider * Added logger fixture * Fixed test_storage_controller * Fixing OpenWPMTest.visit() * Moved test/storage_providers to test/storage * Fixing up tests * Moved automation to openwpm * Readded datadir to .gitignore * Ran repin.sh * Fixed formatting * Let's see if this works * Fixed imports * Got arrow_memory_provider working * Starting to rewrite tests * Setting up fixtures * Attempting to fix all the tests * Still fixing tests * Broken content saving * Added node * Fixed screenshot tests * Fixing more tests * Fixed tests * Implemented local_storage.py * Cleaned up flush_cache * Fixing more tests * Wrote test for LocalArrowProvider * Introduced tests for local_storage_provider.py * Asserting test dir is empty * Creating subfolder for different aggregators * New depencies and init() * Everything is terribly broken * Figured out finalize_visit_id * Running two event loops kinda works??? * Rearming the event * Introduced mypy * Downgraded black in pre-commit * Modifying the database directly * Fixed formatting * Made mypy a lil stricter * Fixing docs and config printing * Realising I've been using the wrong with * Trying to figure arrow_storage * Moving lock initialization in in_memory_storage * Fixing tests * Fixing up tests and adding more typechecking * Fixed num_browsers in test_cache_hits_recorded * Parametrized unstructured * String fix * Added failing test * New test * Review changes with Steven * Fixed repin.sh and test_arrow_cache * Minor change * Fixed prune-environment.py * Removing references to DataAggregator * Fixed test_seed_persistance * More paths * Fixed test display shutdown * Made cache test more robust * Update crawler.py Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Slimming down ManagerParams * Fixing more tests * Update test/storage/test_storage_controller.py Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Purging references to DataAggregator * Reverted changes to .travis.yml * Demo.py saves locally again * Readjusting test paths * Expanded comment on initialize to reference #846 * Made token optional in finalize_visit_id * Simplified test paramtetrization * Fixed callback semantics change * Removed test_parse_http_stack_trace_str * Added DataSocket * WIP need to fix path encoding * Fixed path encoding * Added task and crawl to schema * Fixed paths in GitHub actions * Refactored completion handling * Fix tests * Trying to fix tests on CI * Removed redundant setting of tag * Removing references to S3 * Purging more DataAggregator references * Craking up logging to figure out test failure * Moved test_values into a fixture * Fixing GcpUnstructuredProvider * Fixed paths for future crawls * Renamed sqllite to official sqlite * Restored demo.py * Update openwpm/commands/profile_commands.py Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Restored previous behaviour of DumpProfileCommand Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Removed leftovers * Cleaned up comments * Expanded lock check * Fixed more stuff * More comment updates * Update openwpm/socket_interface.py Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Removed outdated comment * Using config_encoder * Renamed tar_location to tar_path * Removed references to database_name in docs * Cleanup * Moved screenshot_path and source_dump_path to ManagerParamsInternal * Fixed imports * Fixing up comments * Fixing up comments * More docs * updated dependencies * Fixed test_task_manager * Reupgraded to python 3.9.1 * Restoring crawl_reference in mp_logger * Removed unused imports * Apply suggestions from code review Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Cleaned up socket handling * Fixed TaskManager.__exit__ * Moved validation code into config.py * Removed comment * Removed comment * Removed comment Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com>
2021-02-22 19:51:32 +03:00
def __init__(self, table_name: TableName, scheme: str) -> None:
Command refactoring (#750) * Refactored GetCommand, BrowseCommand to have execute method * Fixed type name format issues in __issue_command * Fixed everything I broke * Changed import style so tests can run * Added BrowseCommad to imports * Added some more self * Added logging to explain failing test * Added one more self * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * Ported SaveScreenshotCommand It now uses the new command.execute(...) syntax * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * Ported SaveScreenshotFullPage #763 * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * Ported DumpPageSource and RecursiveDumpPageSource (#767) * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Command refactoring (#770) * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Ran isort * Added append_command * remove custom function command and format code * Refactored GetCommand, BrowseCommand to have execute method * Fixed type name format issues in __issue_command * Fixed everything I broke * Changed import style so tests can run * Added BrowseCommad to imports * Added some more self * Added logging to explain failing test * Added one more self * Ported SaveScreenshotCommand It now uses the new command.execute(...) syntax * Ported SaveScreenshotFullPage #763 * Ported DumpPageSource and RecursiveDumpPageSource (#767) * Command refactoring (#770) * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Ran isort * Added append_command * remove duplicate append_command * Refactored GetCommand, BrowseCommand to have execute method * Fixed type name format issues in __issue_command * Fixed everything I broke * Changed import style so tests can run * Added BrowseCommad to imports * Added some more self * Added logging to explain failing test * Added one more self * Ported SaveScreenshotCommand It now uses the new command.execute(...) syntax * Ported SaveScreenshotFullPage #763 * Ported DumpPageSource and RecursiveDumpPageSource (#767) * Command refactoring (#770) * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Ran isort * Added append_command * generate new xpi * Fixing tests * Fixing tests * Fixing up more tests * Removed type annotations * Fixing tests * Fixing tests * Removed command_executor * Moved Commands to commands * Fixing imports * Fixed skipped test * Removed duplicate append_command * docs: update adding command in usingOpenWPM * Forgot to save * Removed datadir * Cleaning up imports * Implemented simple command * Added documentation to simple_command.py * Renamed to custom_command.py * Moved docs around * Referencing BaseCommand.execute * Update docs/Using_OpenWPM.md Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> Co-authored-by: Cyrus <cyruskarsan@gmail.com> Co-authored-by: cyruskarsan <55566678+cyruskarsan@users.noreply.github.com> Co-authored-by: Steven Englehardt <senglehardt@mozilla.com>
2021-01-09 13:15:01 +03:00
self.scheme = scheme
self.table_name = table_name
def execute(
self,
webdriver: Firefox,
browser_params: BrowserParams,
Data Aggregator Rewrite (#753) * First steps in the rewrite * Fixed import paths * One giant refactor * Fixing tests * Adding mypy * Removed mypy from pre-commit workflow * First draft on DataAggregator * Wrote a DataAggregator that starts and shuts down * Created tests and added more empty types * Got demo.py working * Created sql_provider * Cleaned up imports in TaskManager * Added async * Fixed minor bugs * First steps at porting arrow * Introduced TableName and different Task handling * Added more failing tests * First first completes others don't * It works * Started working on arrow_provider * Implemented ArrowProvider * Added logger fixture * Fixed test_storage_controller * Fixing OpenWPMTest.visit() * Moved test/storage_providers to test/storage * Fixing up tests * Moved automation to openwpm * Readded datadir to .gitignore * Ran repin.sh * Fixed formatting * Let's see if this works * Fixed imports * Got arrow_memory_provider working * Starting to rewrite tests * Setting up fixtures * Attempting to fix all the tests * Still fixing tests * Broken content saving * Added node * Fixed screenshot tests * Fixing more tests * Fixed tests * Implemented local_storage.py * Cleaned up flush_cache * Fixing more tests * Wrote test for LocalArrowProvider * Introduced tests for local_storage_provider.py * Asserting test dir is empty * Creating subfolder for different aggregators * New depencies and init() * Everything is terribly broken * Figured out finalize_visit_id * Running two event loops kinda works??? * Rearming the event * Introduced mypy * Downgraded black in pre-commit * Modifying the database directly * Fixed formatting * Made mypy a lil stricter * Fixing docs and config printing * Realising I've been using the wrong with * Trying to figure arrow_storage * Moving lock initialization in in_memory_storage * Fixing tests * Fixing up tests and adding more typechecking * Fixed num_browsers in test_cache_hits_recorded * Parametrized unstructured * String fix * Added failing test * New test * Review changes with Steven * Fixed repin.sh and test_arrow_cache * Minor change * Fixed prune-environment.py * Removing references to DataAggregator * Fixed test_seed_persistance * More paths * Fixed test display shutdown * Made cache test more robust * Update crawler.py Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Slimming down ManagerParams * Fixing more tests * Update test/storage/test_storage_controller.py Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Purging references to DataAggregator * Reverted changes to .travis.yml * Demo.py saves locally again * Readjusting test paths * Expanded comment on initialize to reference #846 * Made token optional in finalize_visit_id * Simplified test paramtetrization * Fixed callback semantics change * Removed test_parse_http_stack_trace_str * Added DataSocket * WIP need to fix path encoding * Fixed path encoding * Added task and crawl to schema * Fixed paths in GitHub actions * Refactored completion handling * Fix tests * Trying to fix tests on CI * Removed redundant setting of tag * Removing references to S3 * Purging more DataAggregator references * Craking up logging to figure out test failure * Moved test_values into a fixture * Fixing GcpUnstructuredProvider * Fixed paths for future crawls * Renamed sqllite to official sqlite * Restored demo.py * Update openwpm/commands/profile_commands.py Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Restored previous behaviour of DumpProfileCommand Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Removed leftovers * Cleaned up comments * Expanded lock check * Fixed more stuff * More comment updates * Update openwpm/socket_interface.py Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Removed outdated comment * Using config_encoder * Renamed tar_location to tar_path * Removed references to database_name in docs * Cleanup * Moved screenshot_path and source_dump_path to ManagerParamsInternal * Fixed imports * Fixing up comments * Fixing up comments * More docs * updated dependencies * Fixed test_task_manager * Reupgraded to python 3.9.1 * Restoring crawl_reference in mp_logger * Removed unused imports * Apply suggestions from code review Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Cleaned up socket handling * Fixed TaskManager.__exit__ * Moved validation code into config.py * Removed comment * Removed comment * Removed comment Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com>
2021-02-22 19:51:32 +03:00
manager_params: ManagerParamsInternal,
Command refactoring (#750) * Refactored GetCommand, BrowseCommand to have execute method * Fixed type name format issues in __issue_command * Fixed everything I broke * Changed import style so tests can run * Added BrowseCommad to imports * Added some more self * Added logging to explain failing test * Added one more self * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * Ported SaveScreenshotCommand It now uses the new command.execute(...) syntax * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * Ported SaveScreenshotFullPage #763 * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * Ported DumpPageSource and RecursiveDumpPageSource (#767) * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Command refactoring (#770) * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Ran isort * Added append_command * remove custom function command and format code * Refactored GetCommand, BrowseCommand to have execute method * Fixed type name format issues in __issue_command * Fixed everything I broke * Changed import style so tests can run * Added BrowseCommad to imports * Added some more self * Added logging to explain failing test * Added one more self * Ported SaveScreenshotCommand It now uses the new command.execute(...) syntax * Ported SaveScreenshotFullPage #763 * Ported DumpPageSource and RecursiveDumpPageSource (#767) * Command refactoring (#770) * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Ran isort * Added append_command * remove duplicate append_command * Refactored GetCommand, BrowseCommand to have execute method * Fixed type name format issues in __issue_command * Fixed everything I broke * Changed import style so tests can run * Added BrowseCommad to imports * Added some more self * Added logging to explain failing test * Added one more self * Ported SaveScreenshotCommand It now uses the new command.execute(...) syntax * Ported SaveScreenshotFullPage #763 * Ported DumpPageSource and RecursiveDumpPageSource (#767) * Command refactoring (#770) * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Ran isort * Added append_command * generate new xpi * Fixing tests * Fixing tests * Fixing up more tests * Removed type annotations * Fixing tests * Fixing tests * Removed command_executor * Moved Commands to commands * Fixing imports * Fixed skipped test * Removed duplicate append_command * docs: update adding command in usingOpenWPM * Forgot to save * Removed datadir * Cleaning up imports * Implemented simple command * Added documentation to simple_command.py * Renamed to custom_command.py * Moved docs around * Referencing BaseCommand.execute * Update docs/Using_OpenWPM.md Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> Co-authored-by: Cyrus <cyruskarsan@gmail.com> Co-authored-by: cyruskarsan <55566678+cyruskarsan@users.noreply.github.com> Co-authored-by: Steven Englehardt <senglehardt@mozilla.com>
2021-01-09 13:15:01 +03:00
extension_socket: ClientSocket,
) -> None:
Data Aggregator Rewrite (#753) * First steps in the rewrite * Fixed import paths * One giant refactor * Fixing tests * Adding mypy * Removed mypy from pre-commit workflow * First draft on DataAggregator * Wrote a DataAggregator that starts and shuts down * Created tests and added more empty types * Got demo.py working * Created sql_provider * Cleaned up imports in TaskManager * Added async * Fixed minor bugs * First steps at porting arrow * Introduced TableName and different Task handling * Added more failing tests * First first completes others don't * It works * Started working on arrow_provider * Implemented ArrowProvider * Added logger fixture * Fixed test_storage_controller * Fixing OpenWPMTest.visit() * Moved test/storage_providers to test/storage * Fixing up tests * Moved automation to openwpm * Readded datadir to .gitignore * Ran repin.sh * Fixed formatting * Let's see if this works * Fixed imports * Got arrow_memory_provider working * Starting to rewrite tests * Setting up fixtures * Attempting to fix all the tests * Still fixing tests * Broken content saving * Added node * Fixed screenshot tests * Fixing more tests * Fixed tests * Implemented local_storage.py * Cleaned up flush_cache * Fixing more tests * Wrote test for LocalArrowProvider * Introduced tests for local_storage_provider.py * Asserting test dir is empty * Creating subfolder for different aggregators * New depencies and init() * Everything is terribly broken * Figured out finalize_visit_id * Running two event loops kinda works??? * Rearming the event * Introduced mypy * Downgraded black in pre-commit * Modifying the database directly * Fixed formatting * Made mypy a lil stricter * Fixing docs and config printing * Realising I've been using the wrong with * Trying to figure arrow_storage * Moving lock initialization in in_memory_storage * Fixing tests * Fixing up tests and adding more typechecking * Fixed num_browsers in test_cache_hits_recorded * Parametrized unstructured * String fix * Added failing test * New test * Review changes with Steven * Fixed repin.sh and test_arrow_cache * Minor change * Fixed prune-environment.py * Removing references to DataAggregator * Fixed test_seed_persistance * More paths * Fixed test display shutdown * Made cache test more robust * Update crawler.py Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Slimming down ManagerParams * Fixing more tests * Update test/storage/test_storage_controller.py Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Purging references to DataAggregator * Reverted changes to .travis.yml * Demo.py saves locally again * Readjusting test paths * Expanded comment on initialize to reference #846 * Made token optional in finalize_visit_id * Simplified test paramtetrization * Fixed callback semantics change * Removed test_parse_http_stack_trace_str * Added DataSocket * WIP need to fix path encoding * Fixed path encoding * Added task and crawl to schema * Fixed paths in GitHub actions * Refactored completion handling * Fix tests * Trying to fix tests on CI * Removed redundant setting of tag * Removing references to S3 * Purging more DataAggregator references * Craking up logging to figure out test failure * Moved test_values into a fixture * Fixing GcpUnstructuredProvider * Fixed paths for future crawls * Renamed sqllite to official sqlite * Restored demo.py * Update openwpm/commands/profile_commands.py Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Restored previous behaviour of DumpProfileCommand Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Removed leftovers * Cleaned up comments * Expanded lock check * Fixed more stuff * More comment updates * Update openwpm/socket_interface.py Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Removed outdated comment * Using config_encoder * Renamed tar_location to tar_path * Removed references to database_name in docs * Cleanup * Moved screenshot_path and source_dump_path to ManagerParamsInternal * Fixed imports * Fixing up comments * Fixing up comments * More docs * updated dependencies * Fixed test_task_manager * Reupgraded to python 3.9.1 * Restoring crawl_reference in mp_logger * Removed unused imports * Apply suggestions from code review Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Cleaned up socket handling * Fixed TaskManager.__exit__ * Moved validation code into config.py * Removed comment * Removed comment * Removed comment Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com>
2021-02-22 19:51:32 +03:00
browser_id = self.browser_id
visit_id = self.visit_id
Command refactoring (#750) * Refactored GetCommand, BrowseCommand to have execute method * Fixed type name format issues in __issue_command * Fixed everything I broke * Changed import style so tests can run * Added BrowseCommad to imports * Added some more self * Added logging to explain failing test * Added one more self * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * Ported SaveScreenshotCommand It now uses the new command.execute(...) syntax * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * Ported SaveScreenshotFullPage #763 * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * Ported DumpPageSource and RecursiveDumpPageSource (#767) * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Command refactoring (#770) * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Ran isort * Added append_command * remove custom function command and format code * Refactored GetCommand, BrowseCommand to have execute method * Fixed type name format issues in __issue_command * Fixed everything I broke * Changed import style so tests can run * Added BrowseCommad to imports * Added some more self * Added logging to explain failing test * Added one more self * Ported SaveScreenshotCommand It now uses the new command.execute(...) syntax * Ported SaveScreenshotFullPage #763 * Ported DumpPageSource and RecursiveDumpPageSource (#767) * Command refactoring (#770) * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Ran isort * Added append_command * remove duplicate append_command * Refactored GetCommand, BrowseCommand to have execute method * Fixed type name format issues in __issue_command * Fixed everything I broke * Changed import style so tests can run * Added BrowseCommad to imports * Added some more self * Added logging to explain failing test * Added one more self * Ported SaveScreenshotCommand It now uses the new command.execute(...) syntax * Ported SaveScreenshotFullPage #763 * Ported DumpPageSource and RecursiveDumpPageSource (#767) * Command refactoring (#770) * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Ran isort * Added append_command * generate new xpi * Fixing tests * Fixing tests * Fixing up more tests * Removed type annotations * Fixing tests * Fixing tests * Removed command_executor * Moved Commands to commands * Fixing imports * Fixed skipped test * Removed duplicate append_command * docs: update adding command in usingOpenWPM * Forgot to save * Removed datadir * Cleaning up imports * Implemented simple command * Added documentation to simple_command.py * Renamed to custom_command.py * Moved docs around * Referencing BaseCommand.execute * Update docs/Using_OpenWPM.md Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> Co-authored-by: Cyrus <cyruskarsan@gmail.com> Co-authored-by: cyruskarsan <55566678+cyruskarsan@users.noreply.github.com> Co-authored-by: Steven Englehardt <senglehardt@mozilla.com>
2021-01-09 13:15:01 +03:00
link_urls = [
x
for x in (
element.get_attribute("href")
for element in webdriver.find_elements_by_tag_name("a")
)
if x.startswith(self.scheme + "://")
]
current_url = webdriver.current_url
sock = ClientSocket()
Data Aggregator Rewrite (#753) * First steps in the rewrite * Fixed import paths * One giant refactor * Fixing tests * Adding mypy * Removed mypy from pre-commit workflow * First draft on DataAggregator * Wrote a DataAggregator that starts and shuts down * Created tests and added more empty types * Got demo.py working * Created sql_provider * Cleaned up imports in TaskManager * Added async * Fixed minor bugs * First steps at porting arrow * Introduced TableName and different Task handling * Added more failing tests * First first completes others don't * It works * Started working on arrow_provider * Implemented ArrowProvider * Added logger fixture * Fixed test_storage_controller * Fixing OpenWPMTest.visit() * Moved test/storage_providers to test/storage * Fixing up tests * Moved automation to openwpm * Readded datadir to .gitignore * Ran repin.sh * Fixed formatting * Let's see if this works * Fixed imports * Got arrow_memory_provider working * Starting to rewrite tests * Setting up fixtures * Attempting to fix all the tests * Still fixing tests * Broken content saving * Added node * Fixed screenshot tests * Fixing more tests * Fixed tests * Implemented local_storage.py * Cleaned up flush_cache * Fixing more tests * Wrote test for LocalArrowProvider * Introduced tests for local_storage_provider.py * Asserting test dir is empty * Creating subfolder for different aggregators * New depencies and init() * Everything is terribly broken * Figured out finalize_visit_id * Running two event loops kinda works??? * Rearming the event * Introduced mypy * Downgraded black in pre-commit * Modifying the database directly * Fixed formatting * Made mypy a lil stricter * Fixing docs and config printing * Realising I've been using the wrong with * Trying to figure arrow_storage * Moving lock initialization in in_memory_storage * Fixing tests * Fixing up tests and adding more typechecking * Fixed num_browsers in test_cache_hits_recorded * Parametrized unstructured * String fix * Added failing test * New test * Review changes with Steven * Fixed repin.sh and test_arrow_cache * Minor change * Fixed prune-environment.py * Removing references to DataAggregator * Fixed test_seed_persistance * More paths * Fixed test display shutdown * Made cache test more robust * Update crawler.py Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Slimming down ManagerParams * Fixing more tests * Update test/storage/test_storage_controller.py Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Purging references to DataAggregator * Reverted changes to .travis.yml * Demo.py saves locally again * Readjusting test paths * Expanded comment on initialize to reference #846 * Made token optional in finalize_visit_id * Simplified test paramtetrization * Fixed callback semantics change * Removed test_parse_http_stack_trace_str * Added DataSocket * WIP need to fix path encoding * Fixed path encoding * Added task and crawl to schema * Fixed paths in GitHub actions * Refactored completion handling * Fix tests * Trying to fix tests on CI * Removed redundant setting of tag * Removing references to S3 * Purging more DataAggregator references * Craking up logging to figure out test failure * Moved test_values into a fixture * Fixing GcpUnstructuredProvider * Fixed paths for future crawls * Renamed sqllite to official sqlite * Restored demo.py * Update openwpm/commands/profile_commands.py Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Restored previous behaviour of DumpProfileCommand Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Removed leftovers * Cleaned up comments * Expanded lock check * Fixed more stuff * More comment updates * Update openwpm/socket_interface.py Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Removed outdated comment * Using config_encoder * Renamed tar_location to tar_path * Removed references to database_name in docs * Cleanup * Moved screenshot_path and source_dump_path to ManagerParamsInternal * Fixed imports * Fixing up comments * Fixing up comments * More docs * updated dependencies * Fixed test_task_manager * Reupgraded to python 3.9.1 * Restoring crawl_reference in mp_logger * Removed unused imports * Apply suggestions from code review Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Cleaned up socket handling * Fixed TaskManager.__exit__ * Moved validation code into config.py * Removed comment * Removed comment * Removed comment Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com>
2021-02-22 19:51:32 +03:00
assert manager_params.storage_controller_address is not None
sock.connect(*manager_params.storage_controller_address)
Command refactoring (#750) * Refactored GetCommand, BrowseCommand to have execute method * Fixed type name format issues in __issue_command * Fixed everything I broke * Changed import style so tests can run * Added BrowseCommad to imports * Added some more self * Added logging to explain failing test * Added one more self * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * Ported SaveScreenshotCommand It now uses the new command.execute(...) syntax * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * Ported SaveScreenshotFullPage #763 * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * Ported DumpPageSource and RecursiveDumpPageSource (#767) * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Command refactoring (#770) * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Ran isort * Added append_command * remove custom function command and format code * Refactored GetCommand, BrowseCommand to have execute method * Fixed type name format issues in __issue_command * Fixed everything I broke * Changed import style so tests can run * Added BrowseCommad to imports * Added some more self * Added logging to explain failing test * Added one more self * Ported SaveScreenshotCommand It now uses the new command.execute(...) syntax * Ported SaveScreenshotFullPage #763 * Ported DumpPageSource and RecursiveDumpPageSource (#767) * Command refactoring (#770) * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Ran isort * Added append_command * remove duplicate append_command * Refactored GetCommand, BrowseCommand to have execute method * Fixed type name format issues in __issue_command * Fixed everything I broke * Changed import style so tests can run * Added BrowseCommad to imports * Added some more self * Added logging to explain failing test * Added one more self * Ported SaveScreenshotCommand It now uses the new command.execute(...) syntax * Ported SaveScreenshotFullPage #763 * Ported DumpPageSource and RecursiveDumpPageSource (#767) * Command refactoring (#770) * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Ran isort * Added append_command * generate new xpi * Fixing tests * Fixing tests * Fixing up more tests * Removed type annotations * Fixing tests * Fixing tests * Removed command_executor * Moved Commands to commands * Fixing imports * Fixed skipped test * Removed duplicate append_command * docs: update adding command in usingOpenWPM * Forgot to save * Removed datadir * Cleaning up imports * Implemented simple command * Added documentation to simple_command.py * Renamed to custom_command.py * Moved docs around * Referencing BaseCommand.execute * Update docs/Using_OpenWPM.md Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> Co-authored-by: Cyrus <cyruskarsan@gmail.com> Co-authored-by: cyruskarsan <55566678+cyruskarsan@users.noreply.github.com> Co-authored-by: Steven Englehardt <senglehardt@mozilla.com>
2021-01-09 13:15:01 +03:00
for link in link_urls:
query = (
self.table_name,
{
"top_url": current_url,
"link": link,
Data Aggregator Rewrite (#753) * First steps in the rewrite * Fixed import paths * One giant refactor * Fixing tests * Adding mypy * Removed mypy from pre-commit workflow * First draft on DataAggregator * Wrote a DataAggregator that starts and shuts down * Created tests and added more empty types * Got demo.py working * Created sql_provider * Cleaned up imports in TaskManager * Added async * Fixed minor bugs * First steps at porting arrow * Introduced TableName and different Task handling * Added more failing tests * First first completes others don't * It works * Started working on arrow_provider * Implemented ArrowProvider * Added logger fixture * Fixed test_storage_controller * Fixing OpenWPMTest.visit() * Moved test/storage_providers to test/storage * Fixing up tests * Moved automation to openwpm * Readded datadir to .gitignore * Ran repin.sh * Fixed formatting * Let's see if this works * Fixed imports * Got arrow_memory_provider working * Starting to rewrite tests * Setting up fixtures * Attempting to fix all the tests * Still fixing tests * Broken content saving * Added node * Fixed screenshot tests * Fixing more tests * Fixed tests * Implemented local_storage.py * Cleaned up flush_cache * Fixing more tests * Wrote test for LocalArrowProvider * Introduced tests for local_storage_provider.py * Asserting test dir is empty * Creating subfolder for different aggregators * New depencies and init() * Everything is terribly broken * Figured out finalize_visit_id * Running two event loops kinda works??? * Rearming the event * Introduced mypy * Downgraded black in pre-commit * Modifying the database directly * Fixed formatting * Made mypy a lil stricter * Fixing docs and config printing * Realising I've been using the wrong with * Trying to figure arrow_storage * Moving lock initialization in in_memory_storage * Fixing tests * Fixing up tests and adding more typechecking * Fixed num_browsers in test_cache_hits_recorded * Parametrized unstructured * String fix * Added failing test * New test * Review changes with Steven * Fixed repin.sh and test_arrow_cache * Minor change * Fixed prune-environment.py * Removing references to DataAggregator * Fixed test_seed_persistance * More paths * Fixed test display shutdown * Made cache test more robust * Update crawler.py Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Slimming down ManagerParams * Fixing more tests * Update test/storage/test_storage_controller.py Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Purging references to DataAggregator * Reverted changes to .travis.yml * Demo.py saves locally again * Readjusting test paths * Expanded comment on initialize to reference #846 * Made token optional in finalize_visit_id * Simplified test paramtetrization * Fixed callback semantics change * Removed test_parse_http_stack_trace_str * Added DataSocket * WIP need to fix path encoding * Fixed path encoding * Added task and crawl to schema * Fixed paths in GitHub actions * Refactored completion handling * Fix tests * Trying to fix tests on CI * Removed redundant setting of tag * Removing references to S3 * Purging more DataAggregator references * Craking up logging to figure out test failure * Moved test_values into a fixture * Fixing GcpUnstructuredProvider * Fixed paths for future crawls * Renamed sqllite to official sqlite * Restored demo.py * Update openwpm/commands/profile_commands.py Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Restored previous behaviour of DumpProfileCommand Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Removed leftovers * Cleaned up comments * Expanded lock check * Fixed more stuff * More comment updates * Update openwpm/socket_interface.py Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Removed outdated comment * Using config_encoder * Renamed tar_location to tar_path * Removed references to database_name in docs * Cleanup * Moved screenshot_path and source_dump_path to ManagerParamsInternal * Fixed imports * Fixing up comments * Fixing up comments * More docs * updated dependencies * Fixed test_task_manager * Reupgraded to python 3.9.1 * Restoring crawl_reference in mp_logger * Removed unused imports * Apply suggestions from code review Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Cleaned up socket handling * Fixed TaskManager.__exit__ * Moved validation code into config.py * Removed comment * Removed comment * Removed comment Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com>
2021-02-22 19:51:32 +03:00
"visit_id": visit_id,
"browser_id": browser_id,
Command refactoring (#750) * Refactored GetCommand, BrowseCommand to have execute method * Fixed type name format issues in __issue_command * Fixed everything I broke * Changed import style so tests can run * Added BrowseCommad to imports * Added some more self * Added logging to explain failing test * Added one more self * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * Ported SaveScreenshotCommand It now uses the new command.execute(...) syntax * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * Ported SaveScreenshotFullPage #763 * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * Ported DumpPageSource and RecursiveDumpPageSource (#767) * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Command refactoring (#770) * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Ran isort * Added append_command * remove custom function command and format code * Refactored GetCommand, BrowseCommand to have execute method * Fixed type name format issues in __issue_command * Fixed everything I broke * Changed import style so tests can run * Added BrowseCommad to imports * Added some more self * Added logging to explain failing test * Added one more self * Ported SaveScreenshotCommand It now uses the new command.execute(...) syntax * Ported SaveScreenshotFullPage #763 * Ported DumpPageSource and RecursiveDumpPageSource (#767) * Command refactoring (#770) * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Ran isort * Added append_command * remove duplicate append_command * Refactored GetCommand, BrowseCommand to have execute method * Fixed type name format issues in __issue_command * Fixed everything I broke * Changed import style so tests can run * Added BrowseCommad to imports * Added some more self * Added logging to explain failing test * Added one more self * Ported SaveScreenshotCommand It now uses the new command.execute(...) syntax * Ported SaveScreenshotFullPage #763 * Ported DumpPageSource and RecursiveDumpPageSource (#767) * Command refactoring (#770) * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Ran isort * Added append_command * generate new xpi * Fixing tests * Fixing tests * Fixing up more tests * Removed type annotations * Fixing tests * Fixing tests * Removed command_executor * Moved Commands to commands * Fixing imports * Fixed skipped test * Removed duplicate append_command * docs: update adding command in usingOpenWPM * Forgot to save * Removed datadir * Cleaning up imports * Implemented simple command * Added documentation to simple_command.py * Renamed to custom_command.py * Moved docs around * Referencing BaseCommand.execute * Update docs/Using_OpenWPM.md Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> Co-authored-by: Cyrus <cyruskarsan@gmail.com> Co-authored-by: cyruskarsan <55566678+cyruskarsan@users.noreply.github.com> Co-authored-by: Steven Englehardt <senglehardt@mozilla.com>
2021-01-09 13:15:01 +03:00
},
)
sock.send(query)
sock.close()
Data Aggregator Rewrite (#753) * First steps in the rewrite * Fixed import paths * One giant refactor * Fixing tests * Adding mypy * Removed mypy from pre-commit workflow * First draft on DataAggregator * Wrote a DataAggregator that starts and shuts down * Created tests and added more empty types * Got demo.py working * Created sql_provider * Cleaned up imports in TaskManager * Added async * Fixed minor bugs * First steps at porting arrow * Introduced TableName and different Task handling * Added more failing tests * First first completes others don't * It works * Started working on arrow_provider * Implemented ArrowProvider * Added logger fixture * Fixed test_storage_controller * Fixing OpenWPMTest.visit() * Moved test/storage_providers to test/storage * Fixing up tests * Moved automation to openwpm * Readded datadir to .gitignore * Ran repin.sh * Fixed formatting * Let's see if this works * Fixed imports * Got arrow_memory_provider working * Starting to rewrite tests * Setting up fixtures * Attempting to fix all the tests * Still fixing tests * Broken content saving * Added node * Fixed screenshot tests * Fixing more tests * Fixed tests * Implemented local_storage.py * Cleaned up flush_cache * Fixing more tests * Wrote test for LocalArrowProvider * Introduced tests for local_storage_provider.py * Asserting test dir is empty * Creating subfolder for different aggregators * New depencies and init() * Everything is terribly broken * Figured out finalize_visit_id * Running two event loops kinda works??? * Rearming the event * Introduced mypy * Downgraded black in pre-commit * Modifying the database directly * Fixed formatting * Made mypy a lil stricter * Fixing docs and config printing * Realising I've been using the wrong with * Trying to figure arrow_storage * Moving lock initialization in in_memory_storage * Fixing tests * Fixing up tests and adding more typechecking * Fixed num_browsers in test_cache_hits_recorded * Parametrized unstructured * String fix * Added failing test * New test * Review changes with Steven * Fixed repin.sh and test_arrow_cache * Minor change * Fixed prune-environment.py * Removing references to DataAggregator * Fixed test_seed_persistance * More paths * Fixed test display shutdown * Made cache test more robust * Update crawler.py Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Slimming down ManagerParams * Fixing more tests * Update test/storage/test_storage_controller.py Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Purging references to DataAggregator * Reverted changes to .travis.yml * Demo.py saves locally again * Readjusting test paths * Expanded comment on initialize to reference #846 * Made token optional in finalize_visit_id * Simplified test paramtetrization * Fixed callback semantics change * Removed test_parse_http_stack_trace_str * Added DataSocket * WIP need to fix path encoding * Fixed path encoding * Added task and crawl to schema * Fixed paths in GitHub actions * Refactored completion handling * Fix tests * Trying to fix tests on CI * Removed redundant setting of tag * Removing references to S3 * Purging more DataAggregator references * Craking up logging to figure out test failure * Moved test_values into a fixture * Fixing GcpUnstructuredProvider * Fixed paths for future crawls * Renamed sqllite to official sqlite * Restored demo.py * Update openwpm/commands/profile_commands.py Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Restored previous behaviour of DumpProfileCommand Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Removed leftovers * Cleaned up comments * Expanded lock check * Fixed more stuff * More comment updates * Update openwpm/socket_interface.py Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Removed outdated comment * Using config_encoder * Renamed tar_location to tar_path * Removed references to database_name in docs * Cleanup * Moved screenshot_path and source_dump_path to ManagerParamsInternal * Fixed imports * Fixing up comments * Fixing up comments * More docs * updated dependencies * Fixed test_task_manager * Reupgraded to python 3.9.1 * Restoring crawl_reference in mp_logger * Removed unused imports * Apply suggestions from code review Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Cleaned up socket handling * Fixed TaskManager.__exit__ * Moved validation code into config.py * Removed comment * Removed comment * Removed comment Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com>
2021-02-22 19:51:32 +03:00
def test_custom_function(default_params, xpi, server):
"""Test `custom_function` with an inline func that collects links"""
Data Aggregator Rewrite (#753) * First steps in the rewrite * Fixed import paths * One giant refactor * Fixing tests * Adding mypy * Removed mypy from pre-commit workflow * First draft on DataAggregator * Wrote a DataAggregator that starts and shuts down * Created tests and added more empty types * Got demo.py working * Created sql_provider * Cleaned up imports in TaskManager * Added async * Fixed minor bugs * First steps at porting arrow * Introduced TableName and different Task handling * Added more failing tests * First first completes others don't * It works * Started working on arrow_provider * Implemented ArrowProvider * Added logger fixture * Fixed test_storage_controller * Fixing OpenWPMTest.visit() * Moved test/storage_providers to test/storage * Fixing up tests * Moved automation to openwpm * Readded datadir to .gitignore * Ran repin.sh * Fixed formatting * Let's see if this works * Fixed imports * Got arrow_memory_provider working * Starting to rewrite tests * Setting up fixtures * Attempting to fix all the tests * Still fixing tests * Broken content saving * Added node * Fixed screenshot tests * Fixing more tests * Fixed tests * Implemented local_storage.py * Cleaned up flush_cache * Fixing more tests * Wrote test for LocalArrowProvider * Introduced tests for local_storage_provider.py * Asserting test dir is empty * Creating subfolder for different aggregators * New depencies and init() * Everything is terribly broken * Figured out finalize_visit_id * Running two event loops kinda works??? * Rearming the event * Introduced mypy * Downgraded black in pre-commit * Modifying the database directly * Fixed formatting * Made mypy a lil stricter * Fixing docs and config printing * Realising I've been using the wrong with * Trying to figure arrow_storage * Moving lock initialization in in_memory_storage * Fixing tests * Fixing up tests and adding more typechecking * Fixed num_browsers in test_cache_hits_recorded * Parametrized unstructured * String fix * Added failing test * New test * Review changes with Steven * Fixed repin.sh and test_arrow_cache * Minor change * Fixed prune-environment.py * Removing references to DataAggregator * Fixed test_seed_persistance * More paths * Fixed test display shutdown * Made cache test more robust * Update crawler.py Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Slimming down ManagerParams * Fixing more tests * Update test/storage/test_storage_controller.py Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Purging references to DataAggregator * Reverted changes to .travis.yml * Demo.py saves locally again * Readjusting test paths * Expanded comment on initialize to reference #846 * Made token optional in finalize_visit_id * Simplified test paramtetrization * Fixed callback semantics change * Removed test_parse_http_stack_trace_str * Added DataSocket * WIP need to fix path encoding * Fixed path encoding * Added task and crawl to schema * Fixed paths in GitHub actions * Refactored completion handling * Fix tests * Trying to fix tests on CI * Removed redundant setting of tag * Removing references to S3 * Purging more DataAggregator references * Craking up logging to figure out test failure * Moved test_values into a fixture * Fixing GcpUnstructuredProvider * Fixed paths for future crawls * Renamed sqllite to official sqlite * Restored demo.py * Update openwpm/commands/profile_commands.py Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Restored previous behaviour of DumpProfileCommand Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Removed leftovers * Cleaned up comments * Expanded lock check * Fixed more stuff * More comment updates * Update openwpm/socket_interface.py Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Removed outdated comment * Using config_encoder * Renamed tar_location to tar_path * Removed references to database_name in docs * Cleanup * Moved screenshot_path and source_dump_path to ManagerParamsInternal * Fixed imports * Fixing up comments * Fixing up comments * More docs * updated dependencies * Fixed test_task_manager * Reupgraded to python 3.9.1 * Restoring crawl_reference in mp_logger * Removed unused imports * Apply suggestions from code review Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Cleaned up socket handling * Fixed TaskManager.__exit__ * Moved validation code into config.py * Removed comment * Removed comment * Removed comment Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com>
2021-02-22 19:51:32 +03:00
table_name = TableName("page_links")
manager_params, browser_params = default_params
path = manager_params.data_directory / "crawl-data.sqlite"
db = sqlite3.connect(path)
cur = db.cursor()
cur.execute(
"""CREATE TABLE IF NOT EXISTS %s (
top_url TEXT, link TEXT,
visit_id INTEGER, browser_id INTEGER);"""
% table_name
)
cur.close()
db.close()
storage_provider = SQLiteStorageProvider(path)
manager = TaskManager(manager_params, browser_params, storage_provider, None)
cs = command_sequence.CommandSequence(url_a)
cs.get(sleep=0, timeout=60)
cs.append_command(CollectLinksCommand(table_name, "http"))
manager.execute_command_sequence(cs)
manager.close()
query_result = db_utils.query_db(
path,
"SELECT top_url, link FROM page_links;",
as_tuple=True,
)
assert PAGE_LINKS == set(query_result)