OpenWPM/test/test_http_instrumentation.py

1115 строки
38 KiB
Python
Исходник Постоянная ссылка Обычный вид История

#!/usr/bin/python
# -*- coding: utf-8 -*-
import base64
import json
import os
from hashlib import sha256
Data Aggregator Rewrite (#753) * First steps in the rewrite * Fixed import paths * One giant refactor * Fixing tests * Adding mypy * Removed mypy from pre-commit workflow * First draft on DataAggregator * Wrote a DataAggregator that starts and shuts down * Created tests and added more empty types * Got demo.py working * Created sql_provider * Cleaned up imports in TaskManager * Added async * Fixed minor bugs * First steps at porting arrow * Introduced TableName and different Task handling * Added more failing tests * First first completes others don't * It works * Started working on arrow_provider * Implemented ArrowProvider * Added logger fixture * Fixed test_storage_controller * Fixing OpenWPMTest.visit() * Moved test/storage_providers to test/storage * Fixing up tests * Moved automation to openwpm * Readded datadir to .gitignore * Ran repin.sh * Fixed formatting * Let's see if this works * Fixed imports * Got arrow_memory_provider working * Starting to rewrite tests * Setting up fixtures * Attempting to fix all the tests * Still fixing tests * Broken content saving * Added node * Fixed screenshot tests * Fixing more tests * Fixed tests * Implemented local_storage.py * Cleaned up flush_cache * Fixing more tests * Wrote test for LocalArrowProvider * Introduced tests for local_storage_provider.py * Asserting test dir is empty * Creating subfolder for different aggregators * New depencies and init() * Everything is terribly broken * Figured out finalize_visit_id * Running two event loops kinda works??? * Rearming the event * Introduced mypy * Downgraded black in pre-commit * Modifying the database directly * Fixed formatting * Made mypy a lil stricter * Fixing docs and config printing * Realising I've been using the wrong with * Trying to figure arrow_storage * Moving lock initialization in in_memory_storage * Fixing tests * Fixing up tests and adding more typechecking * Fixed num_browsers in test_cache_hits_recorded * Parametrized unstructured * String fix * Added failing test * New test * Review changes with Steven * Fixed repin.sh and test_arrow_cache * Minor change * Fixed prune-environment.py * Removing references to DataAggregator * Fixed test_seed_persistance * More paths * Fixed test display shutdown * Made cache test more robust * Update crawler.py Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Slimming down ManagerParams * Fixing more tests * Update test/storage/test_storage_controller.py Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Purging references to DataAggregator * Reverted changes to .travis.yml * Demo.py saves locally again * Readjusting test paths * Expanded comment on initialize to reference #846 * Made token optional in finalize_visit_id * Simplified test paramtetrization * Fixed callback semantics change * Removed test_parse_http_stack_trace_str * Added DataSocket * WIP need to fix path encoding * Fixed path encoding * Added task and crawl to schema * Fixed paths in GitHub actions * Refactored completion handling * Fix tests * Trying to fix tests on CI * Removed redundant setting of tag * Removing references to S3 * Purging more DataAggregator references * Craking up logging to figure out test failure * Moved test_values into a fixture * Fixing GcpUnstructuredProvider * Fixed paths for future crawls * Renamed sqllite to official sqlite * Restored demo.py * Update openwpm/commands/profile_commands.py Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Restored previous behaviour of DumpProfileCommand Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Removed leftovers * Cleaned up comments * Expanded lock check * Fixed more stuff * More comment updates * Update openwpm/socket_interface.py Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Removed outdated comment * Using config_encoder * Renamed tar_location to tar_path * Removed references to database_name in docs * Cleanup * Moved screenshot_path and source_dump_path to ManagerParamsInternal * Fixed imports * Fixing up comments * Fixing up comments * More docs * updated dependencies * Fixed test_task_manager * Reupgraded to python 3.9.1 * Restoring crawl_reference in mp_logger * Removed unused imports * Apply suggestions from code review Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Cleaned up socket handling * Fixed TaskManager.__exit__ * Moved validation code into config.py * Removed comment * Removed comment * Removed comment Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com>
2021-02-22 19:51:32 +03:00
from pathlib import Path
from time import sleep
Data Aggregator Rewrite (#753) * First steps in the rewrite * Fixed import paths * One giant refactor * Fixing tests * Adding mypy * Removed mypy from pre-commit workflow * First draft on DataAggregator * Wrote a DataAggregator that starts and shuts down * Created tests and added more empty types * Got demo.py working * Created sql_provider * Cleaned up imports in TaskManager * Added async * Fixed minor bugs * First steps at porting arrow * Introduced TableName and different Task handling * Added more failing tests * First first completes others don't * It works * Started working on arrow_provider * Implemented ArrowProvider * Added logger fixture * Fixed test_storage_controller * Fixing OpenWPMTest.visit() * Moved test/storage_providers to test/storage * Fixing up tests * Moved automation to openwpm * Readded datadir to .gitignore * Ran repin.sh * Fixed formatting * Let's see if this works * Fixed imports * Got arrow_memory_provider working * Starting to rewrite tests * Setting up fixtures * Attempting to fix all the tests * Still fixing tests * Broken content saving * Added node * Fixed screenshot tests * Fixing more tests * Fixed tests * Implemented local_storage.py * Cleaned up flush_cache * Fixing more tests * Wrote test for LocalArrowProvider * Introduced tests for local_storage_provider.py * Asserting test dir is empty * Creating subfolder for different aggregators * New depencies and init() * Everything is terribly broken * Figured out finalize_visit_id * Running two event loops kinda works??? * Rearming the event * Introduced mypy * Downgraded black in pre-commit * Modifying the database directly * Fixed formatting * Made mypy a lil stricter * Fixing docs and config printing * Realising I've been using the wrong with * Trying to figure arrow_storage * Moving lock initialization in in_memory_storage * Fixing tests * Fixing up tests and adding more typechecking * Fixed num_browsers in test_cache_hits_recorded * Parametrized unstructured * String fix * Added failing test * New test * Review changes with Steven * Fixed repin.sh and test_arrow_cache * Minor change * Fixed prune-environment.py * Removing references to DataAggregator * Fixed test_seed_persistance * More paths * Fixed test display shutdown * Made cache test more robust * Update crawler.py Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Slimming down ManagerParams * Fixing more tests * Update test/storage/test_storage_controller.py Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Purging references to DataAggregator * Reverted changes to .travis.yml * Demo.py saves locally again * Readjusting test paths * Expanded comment on initialize to reference #846 * Made token optional in finalize_visit_id * Simplified test paramtetrization * Fixed callback semantics change * Removed test_parse_http_stack_trace_str * Added DataSocket * WIP need to fix path encoding * Fixed path encoding * Added task and crawl to schema * Fixed paths in GitHub actions * Refactored completion handling * Fix tests * Trying to fix tests on CI * Removed redundant setting of tag * Removing references to S3 * Purging more DataAggregator references * Craking up logging to figure out test failure * Moved test_values into a fixture * Fixing GcpUnstructuredProvider * Fixed paths for future crawls * Renamed sqllite to official sqlite * Restored demo.py * Update openwpm/commands/profile_commands.py Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Restored previous behaviour of DumpProfileCommand Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Removed leftovers * Cleaned up comments * Expanded lock check * Fixed more stuff * More comment updates * Update openwpm/socket_interface.py Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Removed outdated comment * Using config_encoder * Renamed tar_location to tar_path * Removed references to database_name in docs * Cleanup * Moved screenshot_path and source_dump_path to ManagerParamsInternal * Fixed imports * Fixing up comments * Fixing up comments * More docs * updated dependencies * Fixed test_task_manager * Reupgraded to python 3.9.1 * Restoring crawl_reference in mp_logger * Removed unused imports * Apply suggestions from code review Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Cleaned up socket handling * Fixed TaskManager.__exit__ * Moved validation code into config.py * Removed comment * Removed comment * Removed comment Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com>
2021-02-22 19:51:32 +03:00
from typing import List, Optional, Set, Tuple
2019-11-20 16:26:15 +03:00
from urllib.parse import urlparse
import pytest
2018-08-15 17:30:21 +03:00
from openwpm import command_sequence, task_manager
Data Aggregator Rewrite (#753) * First steps in the rewrite * Fixed import paths * One giant refactor * Fixing tests * Adding mypy * Removed mypy from pre-commit workflow * First draft on DataAggregator * Wrote a DataAggregator that starts and shuts down * Created tests and added more empty types * Got demo.py working * Created sql_provider * Cleaned up imports in TaskManager * Added async * Fixed minor bugs * First steps at porting arrow * Introduced TableName and different Task handling * Added more failing tests * First first completes others don't * It works * Started working on arrow_provider * Implemented ArrowProvider * Added logger fixture * Fixed test_storage_controller * Fixing OpenWPMTest.visit() * Moved test/storage_providers to test/storage * Fixing up tests * Moved automation to openwpm * Readded datadir to .gitignore * Ran repin.sh * Fixed formatting * Let's see if this works * Fixed imports * Got arrow_memory_provider working * Starting to rewrite tests * Setting up fixtures * Attempting to fix all the tests * Still fixing tests * Broken content saving * Added node * Fixed screenshot tests * Fixing more tests * Fixed tests * Implemented local_storage.py * Cleaned up flush_cache * Fixing more tests * Wrote test for LocalArrowProvider * Introduced tests for local_storage_provider.py * Asserting test dir is empty * Creating subfolder for different aggregators * New depencies and init() * Everything is terribly broken * Figured out finalize_visit_id * Running two event loops kinda works??? * Rearming the event * Introduced mypy * Downgraded black in pre-commit * Modifying the database directly * Fixed formatting * Made mypy a lil stricter * Fixing docs and config printing * Realising I've been using the wrong with * Trying to figure arrow_storage * Moving lock initialization in in_memory_storage * Fixing tests * Fixing up tests and adding more typechecking * Fixed num_browsers in test_cache_hits_recorded * Parametrized unstructured * String fix * Added failing test * New test * Review changes with Steven * Fixed repin.sh and test_arrow_cache * Minor change * Fixed prune-environment.py * Removing references to DataAggregator * Fixed test_seed_persistance * More paths * Fixed test display shutdown * Made cache test more robust * Update crawler.py Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Slimming down ManagerParams * Fixing more tests * Update test/storage/test_storage_controller.py Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Purging references to DataAggregator * Reverted changes to .travis.yml * Demo.py saves locally again * Readjusting test paths * Expanded comment on initialize to reference #846 * Made token optional in finalize_visit_id * Simplified test paramtetrization * Fixed callback semantics change * Removed test_parse_http_stack_trace_str * Added DataSocket * WIP need to fix path encoding * Fixed path encoding * Added task and crawl to schema * Fixed paths in GitHub actions * Refactored completion handling * Fix tests * Trying to fix tests on CI * Removed redundant setting of tag * Removing references to S3 * Purging more DataAggregator references * Craking up logging to figure out test failure * Moved test_values into a fixture * Fixing GcpUnstructuredProvider * Fixed paths for future crawls * Renamed sqllite to official sqlite * Restored demo.py * Update openwpm/commands/profile_commands.py Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Restored previous behaviour of DumpProfileCommand Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Removed leftovers * Cleaned up comments * Expanded lock check * Fixed more stuff * More comment updates * Update openwpm/socket_interface.py Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Removed outdated comment * Using config_encoder * Renamed tar_location to tar_path * Removed references to database_name in docs * Cleanup * Moved screenshot_path and source_dump_path to ManagerParamsInternal * Fixed imports * Fixing up comments * Fixing up comments * More docs * updated dependencies * Fixed test_task_manager * Reupgraded to python 3.9.1 * Restoring crawl_reference in mp_logger * Removed unused imports * Apply suggestions from code review Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Cleaned up socket handling * Fixed TaskManager.__exit__ * Moved validation code into config.py * Removed comment * Removed comment * Removed comment Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com>
2021-02-22 19:51:32 +03:00
from openwpm.command_sequence import CommandSequence
Command refactoring (#750) * Refactored GetCommand, BrowseCommand to have execute method * Fixed type name format issues in __issue_command * Fixed everything I broke * Changed import style so tests can run * Added BrowseCommad to imports * Added some more self * Added logging to explain failing test * Added one more self * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * Ported SaveScreenshotCommand It now uses the new command.execute(...) syntax * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * Ported SaveScreenshotFullPage #763 * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * Ported DumpPageSource and RecursiveDumpPageSource (#767) * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Command refactoring (#770) * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Ran isort * Added append_command * remove custom function command and format code * Refactored GetCommand, BrowseCommand to have execute method * Fixed type name format issues in __issue_command * Fixed everything I broke * Changed import style so tests can run * Added BrowseCommad to imports * Added some more self * Added logging to explain failing test * Added one more self * Ported SaveScreenshotCommand It now uses the new command.execute(...) syntax * Ported SaveScreenshotFullPage #763 * Ported DumpPageSource and RecursiveDumpPageSource (#767) * Command refactoring (#770) * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Ran isort * Added append_command * remove duplicate append_command * Refactored GetCommand, BrowseCommand to have execute method * Fixed type name format issues in __issue_command * Fixed everything I broke * Changed import style so tests can run * Added BrowseCommad to imports * Added some more self * Added logging to explain failing test * Added one more self * Ported SaveScreenshotCommand It now uses the new command.execute(...) syntax * Ported SaveScreenshotFullPage #763 * Ported DumpPageSource and RecursiveDumpPageSource (#767) * Command refactoring (#770) * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Ran isort * Added append_command * generate new xpi * Fixing tests * Fixing tests * Fixing up more tests * Removed type annotations * Fixing tests * Fixing tests * Removed command_executor * Moved Commands to commands * Fixing imports * Fixed skipped test * Removed duplicate append_command * docs: update adding command in usingOpenWPM * Forgot to save * Removed datadir * Cleaning up imports * Implemented simple command * Added documentation to simple_command.py * Renamed to custom_command.py * Moved docs around * Referencing BaseCommand.execute * Update docs/Using_OpenWPM.md Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> Co-authored-by: Cyrus <cyruskarsan@gmail.com> Co-authored-by: cyruskarsan <55566678+cyruskarsan@users.noreply.github.com> Co-authored-by: Steven Englehardt <senglehardt@mozilla.com>
2021-01-09 13:15:01 +03:00
from openwpm.commands.types import BaseCommand
Data Aggregator Rewrite (#753) * First steps in the rewrite * Fixed import paths * One giant refactor * Fixing tests * Adding mypy * Removed mypy from pre-commit workflow * First draft on DataAggregator * Wrote a DataAggregator that starts and shuts down * Created tests and added more empty types * Got demo.py working * Created sql_provider * Cleaned up imports in TaskManager * Added async * Fixed minor bugs * First steps at porting arrow * Introduced TableName and different Task handling * Added more failing tests * First first completes others don't * It works * Started working on arrow_provider * Implemented ArrowProvider * Added logger fixture * Fixed test_storage_controller * Fixing OpenWPMTest.visit() * Moved test/storage_providers to test/storage * Fixing up tests * Moved automation to openwpm * Readded datadir to .gitignore * Ran repin.sh * Fixed formatting * Let's see if this works * Fixed imports * Got arrow_memory_provider working * Starting to rewrite tests * Setting up fixtures * Attempting to fix all the tests * Still fixing tests * Broken content saving * Added node * Fixed screenshot tests * Fixing more tests * Fixed tests * Implemented local_storage.py * Cleaned up flush_cache * Fixing more tests * Wrote test for LocalArrowProvider * Introduced tests for local_storage_provider.py * Asserting test dir is empty * Creating subfolder for different aggregators * New depencies and init() * Everything is terribly broken * Figured out finalize_visit_id * Running two event loops kinda works??? * Rearming the event * Introduced mypy * Downgraded black in pre-commit * Modifying the database directly * Fixed formatting * Made mypy a lil stricter * Fixing docs and config printing * Realising I've been using the wrong with * Trying to figure arrow_storage * Moving lock initialization in in_memory_storage * Fixing tests * Fixing up tests and adding more typechecking * Fixed num_browsers in test_cache_hits_recorded * Parametrized unstructured * String fix * Added failing test * New test * Review changes with Steven * Fixed repin.sh and test_arrow_cache * Minor change * Fixed prune-environment.py * Removing references to DataAggregator * Fixed test_seed_persistance * More paths * Fixed test display shutdown * Made cache test more robust * Update crawler.py Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Slimming down ManagerParams * Fixing more tests * Update test/storage/test_storage_controller.py Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Purging references to DataAggregator * Reverted changes to .travis.yml * Demo.py saves locally again * Readjusting test paths * Expanded comment on initialize to reference #846 * Made token optional in finalize_visit_id * Simplified test paramtetrization * Fixed callback semantics change * Removed test_parse_http_stack_trace_str * Added DataSocket * WIP need to fix path encoding * Fixed path encoding * Added task and crawl to schema * Fixed paths in GitHub actions * Refactored completion handling * Fix tests * Trying to fix tests on CI * Removed redundant setting of tag * Removing references to S3 * Purging more DataAggregator references * Craking up logging to figure out test failure * Moved test_values into a fixture * Fixing GcpUnstructuredProvider * Fixed paths for future crawls * Renamed sqllite to official sqlite * Restored demo.py * Update openwpm/commands/profile_commands.py Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Restored previous behaviour of DumpProfileCommand Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Removed leftovers * Cleaned up comments * Expanded lock check * Fixed more stuff * More comment updates * Update openwpm/socket_interface.py Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Removed outdated comment * Using config_encoder * Renamed tar_location to tar_path * Removed references to database_name in docs * Cleanup * Moved screenshot_path and source_dump_path to ManagerParamsInternal * Fixed imports * Fixing up comments * Fixing up comments * More docs * updated dependencies * Fixed test_task_manager * Reupgraded to python 3.9.1 * Restoring crawl_reference in mp_logger * Removed unused imports * Apply suggestions from code review Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Cleaned up socket handling * Fixed TaskManager.__exit__ * Moved validation code into config.py * Removed comment * Removed comment * Removed comment Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com>
2021-02-22 19:51:32 +03:00
from openwpm.config import BrowserParams, ManagerParams
from openwpm.storage.leveldb import LevelDbProvider
from openwpm.storage.sql_provider import SQLiteStorageProvider
from openwpm.utilities import db_utils
2019-04-26 20:07:40 +03:00
from . import utilities
from .openwpmtest import OpenWPMTest
# Data for test_page_visit
# format: (
# request_url,
# top_level_url,
# triggering_origin,
# loading_origin,
# loading_href,
# is_XHR, is_tp_content, is_tp_window,
# resource_type
HTTP_REQUESTS = {
2019-02-26 13:44:24 +03:00
(
f"{utilities.BASE_TEST_URL}/http_test_page.html",
f"{utilities.BASE_TEST_URL}/http_test_page.html",
"undefined",
"undefined",
"undefined",
2020-09-11 16:14:09 +03:00
0,
None,
None,
"main_frame",
),
2019-02-26 13:44:24 +03:00
(
f"{utilities.BASE_TEST_URL}/shared/test_favicon.ico",
f"{utilities.BASE_TEST_URL}/http_test_page.html",
f"{utilities.BASE_TEST_URL_NOPATH}",
f"{utilities.BASE_TEST_URL_NOPATH}",
f"{utilities.BASE_TEST_URL}/http_test_page.html",
2020-09-11 16:14:09 +03:00
0,
None,
None,
"image",
),
2019-02-26 13:44:24 +03:00
(
f"{utilities.BASE_TEST_URL}/shared/test_image_2.png",
f"{utilities.BASE_TEST_URL}/http_test_page.html",
f"{utilities.BASE_TEST_URL_NOPATH}",
f"{utilities.BASE_TEST_URL_NOPATH}",
f"{utilities.BASE_TEST_URL}/http_test_page_2.html",
2020-09-11 16:14:09 +03:00
0,
None,
None,
"image",
),
2019-02-26 13:44:24 +03:00
(
f"{utilities.BASE_TEST_URL}/shared/test_script_2.js",
f"{utilities.BASE_TEST_URL}/http_test_page.html",
f"{utilities.BASE_TEST_URL_NOPATH}",
f"{utilities.BASE_TEST_URL_NOPATH}",
f"{utilities.BASE_TEST_URL}/http_test_page_2.html",
2020-09-11 16:14:09 +03:00
0,
None,
None,
"script",
),
2019-02-26 13:44:24 +03:00
(
f"{utilities.BASE_TEST_URL}/shared/test_script.js",
f"{utilities.BASE_TEST_URL}/http_test_page.html",
f"{utilities.BASE_TEST_URL_NOPATH}",
f"{utilities.BASE_TEST_URL_NOPATH}",
f"{utilities.BASE_TEST_URL}/http_test_page.html",
2020-09-11 16:14:09 +03:00
0,
None,
None,
"script",
),
2019-02-26 13:44:24 +03:00
(
f"{utilities.BASE_TEST_URL}/shared/test_image.png",
f"{utilities.BASE_TEST_URL}/http_test_page.html",
f"{utilities.BASE_TEST_URL_NOPATH}",
f"{utilities.BASE_TEST_URL_NOPATH}",
f"{utilities.BASE_TEST_URL}/http_test_page.html",
2020-09-11 16:14:09 +03:00
0,
None,
None,
"image",
),
2019-02-26 13:44:24 +03:00
(
f"{utilities.BASE_TEST_URL}/http_test_page_2.html",
f"{utilities.BASE_TEST_URL}/http_test_page.html",
f"{utilities.BASE_TEST_URL_NOPATH}",
f"{utilities.BASE_TEST_URL_NOPATH}",
f"{utilities.BASE_TEST_URL}/http_test_page.html",
2020-09-11 16:14:09 +03:00
0,
None,
None,
"sub_frame",
),
2019-02-26 13:44:24 +03:00
(
f"{utilities.BASE_TEST_URL}/shared/test_style.css",
f"{utilities.BASE_TEST_URL}/http_test_page.html",
f"{utilities.BASE_TEST_URL_NOPATH}",
f"{utilities.BASE_TEST_URL_NOPATH}",
f"{utilities.BASE_TEST_URL}/http_test_page.html",
2020-09-11 16:14:09 +03:00
0,
None,
None,
"stylesheet",
),
2019-02-26 13:44:24 +03:00
(
f"{utilities.BASE_TEST_URL_NOPATH}/404.png",
f"{utilities.BASE_TEST_URL}/http_test_page.html",
f"{utilities.BASE_TEST_URL_NOPATH}",
f"{utilities.BASE_TEST_URL_NOPATH}",
f"{utilities.BASE_TEST_URL}/http_test_page_2.html",
2020-09-11 16:14:09 +03:00
0,
None,
None,
"image",
),
2019-02-26 13:44:24 +03:00
(
f"{utilities.BASE_TEST_URL_NOPATH}/MAGIC_REDIRECT/frame1.png",
f"{utilities.BASE_TEST_URL}/http_test_page.html",
f"{utilities.BASE_TEST_URL_NOPATH}",
f"{utilities.BASE_TEST_URL_NOPATH}",
f"{utilities.BASE_TEST_URL}/http_test_page_2.html",
2020-09-11 16:14:09 +03:00
0,
None,
None,
"image",
),
2019-02-26 13:44:24 +03:00
(
f"{utilities.BASE_TEST_URL_NOPATH}/MAGIC_REDIRECT/frame2.png",
f"{utilities.BASE_TEST_URL}/http_test_page.html",
f"{utilities.BASE_TEST_URL_NOPATH}",
f"{utilities.BASE_TEST_URL_NOPATH}",
f"{utilities.BASE_TEST_URL}/http_test_page_2.html",
2020-09-11 16:14:09 +03:00
0,
None,
None,
"image",
),
2019-02-26 13:44:24 +03:00
(
f"{utilities.BASE_TEST_URL_NOPATH}/MAGIC_REDIRECT/req1.png",
f"{utilities.BASE_TEST_URL}/http_test_page.html",
f"{utilities.BASE_TEST_URL_NOPATH}",
f"{utilities.BASE_TEST_URL_NOPATH}",
f"{utilities.BASE_TEST_URL}/http_test_page.html",
2020-09-11 16:14:09 +03:00
0,
None,
None,
"image",
),
2019-02-26 13:44:24 +03:00
(
f"{utilities.BASE_TEST_URL_NOPATH}/MAGIC_REDIRECT/req2.png",
f"{utilities.BASE_TEST_URL}/http_test_page.html",
f"{utilities.BASE_TEST_URL_NOPATH}",
f"{utilities.BASE_TEST_URL_NOPATH}",
f"{utilities.BASE_TEST_URL}/http_test_page.html",
2020-09-11 16:14:09 +03:00
0,
None,
None,
"image",
),
2019-02-26 13:44:24 +03:00
(
f"{utilities.BASE_TEST_URL_NOPATH}/MAGIC_REDIRECT/req3.png",
f"{utilities.BASE_TEST_URL}/http_test_page.html",
f"{utilities.BASE_TEST_URL_NOPATH}",
f"{utilities.BASE_TEST_URL_NOPATH}",
f"{utilities.BASE_TEST_URL}/http_test_page.html",
2020-09-11 16:14:09 +03:00
0,
None,
None,
"image",
),
2019-02-26 13:44:24 +03:00
(
f"{utilities.BASE_TEST_URL}/shared/test_image_2.png",
f"{utilities.BASE_TEST_URL}/http_test_page.html",
f"{utilities.BASE_TEST_URL_NOPATH}",
f"{utilities.BASE_TEST_URL_NOPATH}",
f"{utilities.BASE_TEST_URL}/http_test_page.html",
2020-09-11 16:14:09 +03:00
0,
None,
None,
"image",
),
2017-03-16 18:59:17 +03:00
}
# format: (request_url, referrer, location)
# TODO: webext instrumentation doesn't support referrer yet
HTTP_RESPONSES = {
2019-02-26 13:44:24 +03:00
(
f"{utilities.BASE_TEST_URL}/http_test_page.html",
2019-02-26 13:44:24 +03:00
# u'',
"",
2020-09-11 16:14:09 +03:00
),
2019-02-26 13:44:24 +03:00
(
f"{utilities.BASE_TEST_URL}/shared/test_favicon.ico",
2019-02-26 13:44:24 +03:00
# u'',
"",
2020-09-11 16:14:09 +03:00
),
2019-02-26 13:44:24 +03:00
(
f"{utilities.BASE_TEST_URL}/shared/test_style.css",
# u'http://localhost:8000/test_pages/http_test_page.html',
"",
2020-09-11 16:14:09 +03:00
),
2019-02-26 13:44:24 +03:00
(
f"{utilities.BASE_TEST_URL}/shared/test_script.js",
# u'http://localhost:8000/test_pages/http_test_page.html',
"",
2020-09-11 16:14:09 +03:00
),
2019-02-26 13:44:24 +03:00
(
f"{utilities.BASE_TEST_URL}/shared/test_image.png",
# u'http://localhost:8000/test_pages/http_test_page.html',
"",
2020-09-11 16:14:09 +03:00
),
2019-02-26 13:44:24 +03:00
(
f"{utilities.BASE_TEST_URL}/http_test_page_2.html",
# u'http://localhost:8000/test_pages/http_test_page.html',
"",
2020-09-11 16:14:09 +03:00
),
2019-02-26 13:44:24 +03:00
(
f"{utilities.BASE_TEST_URL}/shared/test_image_2.png",
# u'http://localhost:8000/test_pages/http_test_page_2.html',
"",
2020-09-11 16:14:09 +03:00
),
2019-02-26 13:44:24 +03:00
(
f"{utilities.BASE_TEST_URL}/shared/test_script_2.js",
# u'http://localhost:8000/test_pages/http_test_page_2.html',
"",
2020-09-11 16:14:09 +03:00
),
2019-02-26 13:44:24 +03:00
(
f"{utilities.BASE_TEST_URL_NOPATH}/404.png",
# u'http://localhost:8000/test_pages/http_test_page_2.html',
"",
2020-09-11 16:14:09 +03:00
),
2019-02-26 13:44:24 +03:00
(
f"{utilities.BASE_TEST_URL}/shared/test_image_2.png",
# u'http://localhost:8000/test_pages/http_test_page.html',
"",
2020-09-11 16:14:09 +03:00
),
}
# format: (source_url, destination_url, location header)
HTTP_REDIRECTS = {
2020-09-11 16:14:09 +03:00
(
f"{utilities.BASE_TEST_URL_NOPATH}/MAGIC_REDIRECT/req1.png",
f"{utilities.BASE_TEST_URL_NOPATH}/MAGIC_REDIRECT/req2.png",
"req2.png?dst=req3.png&dst=/test_pages/shared/test_image_2.png",
2020-09-11 16:14:09 +03:00
),
(
f"{utilities.BASE_TEST_URL_NOPATH}/MAGIC_REDIRECT/req2.png",
f"{utilities.BASE_TEST_URL_NOPATH}/MAGIC_REDIRECT/req3.png",
"req3.png?dst=/test_pages/shared/test_image_2.png",
2020-09-11 16:14:09 +03:00
),
(
f"{utilities.BASE_TEST_URL_NOPATH}/MAGIC_REDIRECT/req3.png",
f"{utilities.BASE_TEST_URL}/shared/test_image_2.png",
"/test_pages/shared/test_image_2.png",
2020-09-11 16:14:09 +03:00
),
(
f"{utilities.BASE_TEST_URL_NOPATH}/MAGIC_REDIRECT/frame1.png",
f"{utilities.BASE_TEST_URL_NOPATH}/MAGIC_REDIRECT/frame2.png",
"frame2.png?dst=/404.png",
2020-09-11 16:14:09 +03:00
),
(
f"{utilities.BASE_TEST_URL_NOPATH}/MAGIC_REDIRECT/frame2.png",
f"{utilities.BASE_TEST_URL_NOPATH}/404.png",
"/404.png",
2020-09-11 16:14:09 +03:00
),
2017-03-16 18:59:17 +03:00
}
# Data for test_cache_hits_recorded
HTTP_CACHED_REQUESTS = {
2019-02-26 13:44:24 +03:00
(
f"{utilities.BASE_TEST_URL}/http_test_page.html",
f"{utilities.BASE_TEST_URL}/http_test_page.html",
"undefined",
"undefined",
"undefined",
2020-09-11 16:14:09 +03:00
0,
None,
None,
"main_frame",
),
2019-02-26 13:44:24 +03:00
(
f"{utilities.BASE_TEST_URL}/shared/test_script_2.js",
f"{utilities.BASE_TEST_URL}/http_test_page.html",
f"{utilities.BASE_TEST_URL_NOPATH}",
f"{utilities.BASE_TEST_URL_NOPATH}",
f"{utilities.BASE_TEST_URL}/http_test_page_2.html",
2020-09-11 16:14:09 +03:00
0,
None,
None,
"script",
),
2019-02-26 13:44:24 +03:00
(
f"{utilities.BASE_TEST_URL}/shared/test_script.js",
f"{utilities.BASE_TEST_URL}/http_test_page.html",
f"{utilities.BASE_TEST_URL_NOPATH}",
f"{utilities.BASE_TEST_URL_NOPATH}",
f"{utilities.BASE_TEST_URL}/http_test_page.html",
2020-09-11 16:14:09 +03:00
0,
None,
None,
"script",
),
2019-02-26 13:44:24 +03:00
(
f"{utilities.BASE_TEST_URL}/http_test_page_2.html",
f"{utilities.BASE_TEST_URL}/http_test_page.html",
f"{utilities.BASE_TEST_URL_NOPATH}",
f"{utilities.BASE_TEST_URL_NOPATH}",
f"{utilities.BASE_TEST_URL}/http_test_page.html",
2020-09-11 16:14:09 +03:00
0,
None,
None,
"sub_frame",
),
2019-02-26 13:44:24 +03:00
(
f"{utilities.BASE_TEST_URL_NOPATH}/404.png",
f"{utilities.BASE_TEST_URL}/http_test_page.html",
f"{utilities.BASE_TEST_URL_NOPATH}",
f"{utilities.BASE_TEST_URL_NOPATH}",
f"{utilities.BASE_TEST_URL}/http_test_page_2.html",
2020-09-11 16:14:09 +03:00
0,
None,
None,
"image",
),
2019-02-26 13:44:24 +03:00
(
f"{utilities.BASE_TEST_URL_NOPATH}/MAGIC_REDIRECT/frame1.png",
f"{utilities.BASE_TEST_URL}/http_test_page.html",
f"{utilities.BASE_TEST_URL_NOPATH}",
f"{utilities.BASE_TEST_URL_NOPATH}",
f"{utilities.BASE_TEST_URL}/http_test_page_2.html",
2020-09-11 16:14:09 +03:00
0,
None,
None,
"image",
),
2019-02-26 13:44:24 +03:00
(
f"{utilities.BASE_TEST_URL_NOPATH}/MAGIC_REDIRECT/frame2.png",
f"{utilities.BASE_TEST_URL}/http_test_page.html",
f"{utilities.BASE_TEST_URL_NOPATH}",
f"{utilities.BASE_TEST_URL_NOPATH}",
f"{utilities.BASE_TEST_URL}/http_test_page_2.html",
2020-09-11 16:14:09 +03:00
0,
None,
None,
"image",
),
2019-02-26 13:44:24 +03:00
(
f"{utilities.BASE_TEST_URL_NOPATH}/MAGIC_REDIRECT/req1.png",
f"{utilities.BASE_TEST_URL}/http_test_page.html",
f"{utilities.BASE_TEST_URL_NOPATH}",
f"{utilities.BASE_TEST_URL_NOPATH}",
f"{utilities.BASE_TEST_URL}/http_test_page.html",
2020-09-11 16:14:09 +03:00
0,
None,
None,
"image",
),
2019-02-26 13:44:24 +03:00
(
f"{utilities.BASE_TEST_URL_NOPATH}/MAGIC_REDIRECT/req2.png",
f"{utilities.BASE_TEST_URL}/http_test_page.html",
f"{utilities.BASE_TEST_URL_NOPATH}",
f"{utilities.BASE_TEST_URL_NOPATH}",
f"{utilities.BASE_TEST_URL}/http_test_page.html",
2020-09-11 16:14:09 +03:00
0,
None,
None,
"image",
),
2019-02-26 13:44:24 +03:00
(
f"{utilities.BASE_TEST_URL_NOPATH}/MAGIC_REDIRECT/req3.png",
f"{utilities.BASE_TEST_URL}/http_test_page.html",
f"{utilities.BASE_TEST_URL_NOPATH}",
f"{utilities.BASE_TEST_URL_NOPATH}",
f"{utilities.BASE_TEST_URL}/http_test_page.html",
2020-09-11 16:14:09 +03:00
0,
None,
None,
"image",
),
2019-02-26 13:44:24 +03:00
(
f"{utilities.BASE_TEST_URL}/shared/test_image_2.png",
f"{utilities.BASE_TEST_URL}/http_test_page.html",
f"{utilities.BASE_TEST_URL_NOPATH}",
f"{utilities.BASE_TEST_URL_NOPATH}",
f"{utilities.BASE_TEST_URL}/http_test_page.html",
2020-09-11 16:14:09 +03:00
0,
None,
None,
"image",
),
2017-03-16 18:59:17 +03:00
}
# format: (request_url, referrer, is_cached)
# TODO: referrer isn't recorded by webext instrumentation yet.
HTTP_CACHED_RESPONSES = {
2019-02-26 13:44:24 +03:00
(
f"{utilities.BASE_TEST_URL}/http_test_page.html",
2019-02-26 13:44:24 +03:00
# u'',
2020-09-11 16:14:09 +03:00
1,
),
2019-02-26 13:44:24 +03:00
(
f"{utilities.BASE_TEST_URL}/shared/test_script.js",
# u'http://localhost:8000/test_pages/http_test_page.html',
2020-09-11 16:14:09 +03:00
1,
),
2019-02-26 13:44:24 +03:00
(
f"{utilities.BASE_TEST_URL}/http_test_page_2.html",
# u'http://localhost:8000/test_pages/http_test_page.html',
2020-09-11 16:14:09 +03:00
1,
),
2019-02-26 13:44:24 +03:00
(
f"{utilities.BASE_TEST_URL}/shared/test_script_2.js",
# u'http://localhost:8000/test_pages/http_test_page_2.html',
2020-09-11 16:14:09 +03:00
1,
),
2019-02-26 13:44:24 +03:00
(
f"{utilities.BASE_TEST_URL_NOPATH}/404.png",
# u'http://localhost:8000/test_pages/http_test_page_2.html',
2020-09-11 16:14:09 +03:00
1,
),
(f"{utilities.BASE_TEST_URL}/shared/test_image_2.png", 1),
}
# format: (source_url, destination_url)
HTTP_CACHED_REDIRECTS = {
2020-09-11 16:14:09 +03:00
(
f"{utilities.BASE_TEST_URL_NOPATH}/MAGIC_REDIRECT/frame1.png",
f"{utilities.BASE_TEST_URL_NOPATH}/MAGIC_REDIRECT/frame2.png",
2020-09-11 16:14:09 +03:00
),
(
f"{utilities.BASE_TEST_URL_NOPATH}/MAGIC_REDIRECT/frame2.png",
f"{utilities.BASE_TEST_URL_NOPATH}/404.png",
2020-09-11 16:14:09 +03:00
),
(
f"{utilities.BASE_TEST_URL_NOPATH}/MAGIC_REDIRECT/req1.png",
f"{utilities.BASE_TEST_URL_NOPATH}/MAGIC_REDIRECT/req2.png",
2020-09-11 16:14:09 +03:00
),
(
f"{utilities.BASE_TEST_URL_NOPATH}/MAGIC_REDIRECT/req2.png",
f"{utilities.BASE_TEST_URL_NOPATH}/MAGIC_REDIRECT/req3.png",
2020-09-11 16:14:09 +03:00
),
(
f"{utilities.BASE_TEST_URL_NOPATH}/MAGIC_REDIRECT/req3.png",
f"{utilities.BASE_TEST_URL}/shared/test_image_2.png",
2020-09-11 16:14:09 +03:00
),
2017-03-16 18:59:17 +03:00
}
# Test URL attribution for worker script requests
HTTP_WORKER_SCRIPT_REQUESTS = {
(
f"{utilities.BASE_TEST_URL}/http_worker_page.html",
f"{utilities.BASE_TEST_URL}/http_worker_page.html",
"undefined",
"undefined",
"undefined",
2020-09-11 16:14:09 +03:00
0,
None,
None,
"main_frame",
),
(
f"{utilities.BASE_TEST_URL}/shared/test_favicon.ico",
f"{utilities.BASE_TEST_URL}/http_worker_page.html",
f"{utilities.BASE_TEST_URL_NOPATH}",
f"{utilities.BASE_TEST_URL_NOPATH}",
f"{utilities.BASE_TEST_URL}/http_worker_page.html",
2020-09-11 16:14:09 +03:00
0,
None,
None,
"image",
),
(
f"{utilities.BASE_TEST_URL}/shared/worker.js",
f"{utilities.BASE_TEST_URL}/http_worker_page.html",
f"{utilities.BASE_TEST_URL_NOPATH}",
f"{utilities.BASE_TEST_URL_NOPATH}",
f"{utilities.BASE_TEST_URL}/http_worker_page.html",
2020-09-11 16:14:09 +03:00
0,
None,
None,
"script",
),
(
f"{utilities.BASE_TEST_URL}/shared/test_image.png",
f"{utilities.BASE_TEST_URL}/http_worker_page.html",
f"{utilities.BASE_TEST_URL_NOPATH}",
f"{utilities.BASE_TEST_URL_NOPATH}",
f"{utilities.BASE_TEST_URL}/http_worker_page.html",
2020-09-11 16:14:09 +03:00
1,
None,
None,
"xmlhttprequest",
),
(
f"{utilities.BASE_TEST_URL}/shared/test_image.png",
f"{utilities.BASE_TEST_URL}/shared/worker.js",
f"{utilities.BASE_TEST_URL_NOPATH}",
f"{utilities.BASE_TEST_URL_NOPATH}",
f"{utilities.BASE_TEST_URL}/shared/worker.js",
2020-09-11 16:14:09 +03:00
1,
None,
None,
"xmlhttprequest",
),
}
# Test URL-attribution for Service Worker requests.
HTTP_SERVICE_WORKER_REQUESTS = {
(
"http://localhost:8000/test_pages/http_service_worker_page.html",
"http://localhost:8000/test_pages/http_service_worker_page.html",
"undefined",
"undefined",
"undefined",
2020-09-11 16:14:09 +03:00
0,
None,
None,
"main_frame",
),
(
"http://localhost:8000/test_pages/shared/test_favicon.ico",
"http://localhost:8000/test_pages/http_service_worker_page.html",
"http://localhost:8000",
"http://localhost:8000",
"http://localhost:8000/test_pages/http_service_worker_page.html",
2020-09-11 16:14:09 +03:00
0,
None,
None,
"image",
),
(
"http://localhost:8000/test_pages/shared/service_worker.js",
"http://localhost:8000/test_pages/http_service_worker_page.html",
"http://localhost:8000",
"http://localhost:8000",
"http://localhost:8000/test_pages/http_service_worker_page.html",
2020-09-11 16:14:09 +03:00
0,
None,
None,
"script",
),
(
"http://localhost:8000/test_pages/shared/test_image.png",
"http://localhost:8000/test_pages/http_service_worker_page.html",
"http://localhost:8000",
"http://localhost:8000",
"http://localhost:8000/test_pages/http_service_worker_page.html",
2020-09-11 16:14:09 +03:00
1,
None,
None,
"xmlhttprequest",
),
(
"http://localhost:8000/test_pages/shared/test_image_2.png",
"http://localhost:8000/test_pages/shared/service_worker.js",
"http://localhost:8000",
"http://localhost:8000",
"http://localhost:8000/test_pages/shared/service_worker.js",
2020-09-11 16:14:09 +03:00
1,
None,
None,
"xmlhttprequest",
),
}
BASE_PATH = os.path.dirname(os.path.realpath(__file__))
2019-08-20 13:04:44 +03:00
class TestHTTPInstrument(OpenWPMTest):
Data Aggregator Rewrite (#753) * First steps in the rewrite * Fixed import paths * One giant refactor * Fixing tests * Adding mypy * Removed mypy from pre-commit workflow * First draft on DataAggregator * Wrote a DataAggregator that starts and shuts down * Created tests and added more empty types * Got demo.py working * Created sql_provider * Cleaned up imports in TaskManager * Added async * Fixed minor bugs * First steps at porting arrow * Introduced TableName and different Task handling * Added more failing tests * First first completes others don't * It works * Started working on arrow_provider * Implemented ArrowProvider * Added logger fixture * Fixed test_storage_controller * Fixing OpenWPMTest.visit() * Moved test/storage_providers to test/storage * Fixing up tests * Moved automation to openwpm * Readded datadir to .gitignore * Ran repin.sh * Fixed formatting * Let's see if this works * Fixed imports * Got arrow_memory_provider working * Starting to rewrite tests * Setting up fixtures * Attempting to fix all the tests * Still fixing tests * Broken content saving * Added node * Fixed screenshot tests * Fixing more tests * Fixed tests * Implemented local_storage.py * Cleaned up flush_cache * Fixing more tests * Wrote test for LocalArrowProvider * Introduced tests for local_storage_provider.py * Asserting test dir is empty * Creating subfolder for different aggregators * New depencies and init() * Everything is terribly broken * Figured out finalize_visit_id * Running two event loops kinda works??? * Rearming the event * Introduced mypy * Downgraded black in pre-commit * Modifying the database directly * Fixed formatting * Made mypy a lil stricter * Fixing docs and config printing * Realising I've been using the wrong with * Trying to figure arrow_storage * Moving lock initialization in in_memory_storage * Fixing tests * Fixing up tests and adding more typechecking * Fixed num_browsers in test_cache_hits_recorded * Parametrized unstructured * String fix * Added failing test * New test * Review changes with Steven * Fixed repin.sh and test_arrow_cache * Minor change * Fixed prune-environment.py * Removing references to DataAggregator * Fixed test_seed_persistance * More paths * Fixed test display shutdown * Made cache test more robust * Update crawler.py Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Slimming down ManagerParams * Fixing more tests * Update test/storage/test_storage_controller.py Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Purging references to DataAggregator * Reverted changes to .travis.yml * Demo.py saves locally again * Readjusting test paths * Expanded comment on initialize to reference #846 * Made token optional in finalize_visit_id * Simplified test paramtetrization * Fixed callback semantics change * Removed test_parse_http_stack_trace_str * Added DataSocket * WIP need to fix path encoding * Fixed path encoding * Added task and crawl to schema * Fixed paths in GitHub actions * Refactored completion handling * Fix tests * Trying to fix tests on CI * Removed redundant setting of tag * Removing references to S3 * Purging more DataAggregator references * Craking up logging to figure out test failure * Moved test_values into a fixture * Fixing GcpUnstructuredProvider * Fixed paths for future crawls * Renamed sqllite to official sqlite * Restored demo.py * Update openwpm/commands/profile_commands.py Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Restored previous behaviour of DumpProfileCommand Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Removed leftovers * Cleaned up comments * Expanded lock check * Fixed more stuff * More comment updates * Update openwpm/socket_interface.py Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Removed outdated comment * Using config_encoder * Renamed tar_location to tar_path * Removed references to database_name in docs * Cleanup * Moved screenshot_path and source_dump_path to ManagerParamsInternal * Fixed imports * Fixing up comments * Fixing up comments * More docs * updated dependencies * Fixed test_task_manager * Reupgraded to python 3.9.1 * Restoring crawl_reference in mp_logger * Removed unused imports * Apply suggestions from code review Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Cleaned up socket handling * Fixed TaskManager.__exit__ * Moved validation code into config.py * Removed comment * Removed comment * Removed comment Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com>
2021-02-22 19:51:32 +03:00
def get_config(
self, data_dir: Optional[Path]
) -> Tuple[ManagerParams, List[BrowserParams]]:
manager_params, browser_params = self.get_test_config(data_dir)
Refactoring browser and manager params into dataclasses (#807) * initial file commit * add new dependency for dataclasses * implemeted basic BrowserParams dataclass * dependencies update * file reformat * implemented basic ManagerParams dataclass * Update environment dependencies * Added new error class to validate browser and manager params * file reformat * Update scripts/environment-unpinned.yaml Co-authored-by: Stefan Zabka <zabkaste@informatik.hu-berlin.de> * added validations for BrowserParams dataclass * Update openwpm/config.py Co-authored-by: Stefan Zabka <zabkaste@informatik.hu-berlin.de> * Removed unnecessary checks Co-authored-by: Stefan Zabka <zabkaste@informatik.hu-berlin.de> * Changed error string formatting Co-authored-by: Stefan Zabka <zabkaste@informatik.hu-berlin.de> * Changed filenamea and necessary imports to resolve conflicts with new master branch(refering to PEP-8 reformatting) * Revert "Changed filenamea and necessary imports to resolve conflicts with new master branch(refering to PEP-8 reformatting)" This reverts commit e550c3bd604f415272bd05ee3d9c76397ad98006. * Revert "Merge branch 'master' into turn_browser_and_manager_params_into_dataclasses" This reverts commit aff5a384e737477746d6a38d3b2be6244f8dfd11, reversing changes made to 6ecaf5d0a94d376126692c3785692ba10626d88a. * Revert "Update environment dependencies" This reverts commit 385825b10aee4610a6e304122bec4ab2b7219a5b. * Revert "Merge branch 'turn_browser_and_manager_params_into_dataclasses' of https://github.com/ankushduacodes/OpenWPM into turn_browser_and_manager_params_into_dataclasses" This reverts commit 6ecaf5d0a94d376126692c3785692ba10626d88a, reversing changes made to e550c3bd604f415272bd05ee3d9c76397ad98006. * file reformat * finalized validate_browser_params function * fixed typo in error string * added validations for manager_params * Explanation for using list for supported browser * Revert "Revert "Merge branch 'master' into turn_browser_and_manager_params_into_dataclasses"" This reverts commit 6c3e98e57bd9c42acd029c74649742dcc81de86c. * Revert "Revert "Changed filenamea and necessary imports to resolve conflicts with new master branch(refering to PEP-8 reformatting)"" This reverts commit fc8f48f1878ea7c43b342989ce581dc3d6eab929. * import name change from .Error to .error * moved call_instrument check to config.py * fixed accidental use of dict syntax in a class * moved save_content check from deploy_firefox.py * deleting redundent file * deleted more redundent files * removed redundant imports * added new save_content check * property name changevariables can not have '-' * added new attribute to ManagerParams * adapted files to validate manager & broswer params - also added logic to convert the objects(BrowserParams and ManagerParams) to dictionaries to not break the functionality - also updated demo.py to work with new file names on this branch * removed obsolete documentaion * Dependency Update * Revert "Dependency Update" This reverts commit 8ee3a02b1764883a1f5922e0b52e9f17f8e098db. * Dependencies Update * unset memory and process watchdogs * add new output_format and failure_limit checks * inheriting dataclasses and added type hints to fn * added todo * fixed inheritance of dataclasses acc. to plan * refactor use of dict to use dataclasses(pending) * more refactoring use of dict to dataclasses - Also changed some type hints related to new refactoring * fixed screenshot directory issue - because of which some of the tests were failing * added try-except clause for unexpected errors * added tests to cover dataclasses * added some new and edited some old docs * refactor use of __dict__ to dataclass.to_dict() * Revert "refactor use of __dict__ to dataclass.to_dict()" This reverts commit a4f35513fa26d23a073c16af9fb332045826dcb2. * fixed some tests * refactor use of __dict__ in favor of dataclass.to_dict() method * removed some TODOS * fixed dataclases validation tests * Update docs/Configuration.md Co-authored-by: Stefan Zabka <zabkaste@informatik.hu-berlin.de> * Update docs/Configuration.md Co-authored-by: Stefan Zabka <zabkaste@informatik.hu-berlin.de> * Update docs/Configuration.md Co-authored-by: Stefan Zabka <zabkaste@informatik.hu-berlin.de> * Update openwpm/config.py Co-authored-by: Stefan Zabka <zabkaste@informatik.hu-berlin.de> * Update openwpm/config.py Co-authored-by: Stefan Zabka <zabkaste@informatik.hu-berlin.de> * Update openwpm/task_manager.py Co-authored-by: Stefan Zabka <zabkaste@informatik.hu-berlin.de> * minor fixed wrt polishing the PR * added new check and test for crawl configs Co-authored-by: Stefan Zabka <zabkaste@informatik.hu-berlin.de>
2020-12-02 12:10:45 +03:00
browser_params[0].http_instrument = True
return manager_params, browser_params
def test_worker_script_requests(self):
"""Check correct URL attribution for requests made by worker script"""
2020-09-11 16:14:09 +03:00
test_url = utilities.BASE_TEST_URL + "/http_worker_page.html"
db = self.visit(test_url)
request_id_to_url = dict()
# HTTP Requests
rows = db_utils.query_db(db, "SELECT * FROM http_requests")
observed_records = set()
for row in rows:
2020-09-11 16:14:09 +03:00
observed_records.add(
(
row["url"].split("?")[0],
row["top_level_url"],
row["triggering_origin"],
row["loading_origin"],
row["loading_href"],
row["is_XHR"],
row["is_third_party_channel"],
row["is_third_party_to_top_window"],
row["resource_type"],
)
)
2020-09-11 16:14:09 +03:00
request_id_to_url[row["request_id"]] = row["url"]
assert HTTP_WORKER_SCRIPT_REQUESTS == observed_records
def test_service_worker_requests(self):
"""Check correct URL attribution for requests made by service worker"""
test_url = utilities.BASE_TEST_URL + "/http_service_worker_page.html"
db = self.visit(test_url)
request_id_to_url = dict()
# HTTP Requests
rows = db_utils.query_db(db, "SELECT * FROM http_requests")
observed_records = set()
for row in rows:
2020-09-11 16:14:09 +03:00
observed_records.add(
(
row["url"].split("?")[0],
row["top_level_url"],
row["triggering_origin"],
row["loading_origin"],
row["loading_href"],
row["is_XHR"],
row["is_third_party_channel"],
row["is_third_party_to_top_window"],
row["resource_type"],
)
)
2020-09-11 16:14:09 +03:00
request_id_to_url[row["request_id"]] = row["url"]
assert HTTP_SERVICE_WORKER_REQUESTS == observed_records
class TestPOSTInstrument(OpenWPMTest):
"""Make sure we can capture all the POST request data.
The encoding types tested are explained here:
https://developer.mozilla.org/en-US/docs/Web/API/XMLHttpRequest/Using_XMLHttpRequest#Using_nothing_but_XMLHttpRequest
"""
2020-09-11 16:14:09 +03:00
post_data = (
'{"email":["test@example.com"],'
'"username":["name surname+你好"],'
'"test":["ПриватБанк – банк для тих, хто йде вперед"]}'
)
2019-02-26 16:48:54 +03:00
post_data_json = json.loads(post_data)
2020-09-11 16:14:09 +03:00
post_data_multiline = (
r'{"email":["test@example.com"],"username":'
r'["name surname+你好"],'
r'"test":["ПриватБанк – банк для тих, хто йде вперед"],'
2019-02-26 17:43:11 +03:00
r'"multiline_text":["line1\r\n\r\nline2 line2_word2"]}'
2020-09-11 16:14:09 +03:00
)
2019-02-26 16:48:54 +03:00
post_data_multiline_json = json.loads(post_data_multiline)
2020-09-11 16:14:09 +03:00
post_data_multiline_raw = (
"email=test@example.com\r\n"
"username=name surname+你好\r\n"
"test=ПриватБанк – банк для тих, хто йде вперед\r\n"
"multiline_text=line1\r\n\r\n"
"line2 line2_word2\r\n"
)
Data Aggregator Rewrite (#753) * First steps in the rewrite * Fixed import paths * One giant refactor * Fixing tests * Adding mypy * Removed mypy from pre-commit workflow * First draft on DataAggregator * Wrote a DataAggregator that starts and shuts down * Created tests and added more empty types * Got demo.py working * Created sql_provider * Cleaned up imports in TaskManager * Added async * Fixed minor bugs * First steps at porting arrow * Introduced TableName and different Task handling * Added more failing tests * First first completes others don't * It works * Started working on arrow_provider * Implemented ArrowProvider * Added logger fixture * Fixed test_storage_controller * Fixing OpenWPMTest.visit() * Moved test/storage_providers to test/storage * Fixing up tests * Moved automation to openwpm * Readded datadir to .gitignore * Ran repin.sh * Fixed formatting * Let's see if this works * Fixed imports * Got arrow_memory_provider working * Starting to rewrite tests * Setting up fixtures * Attempting to fix all the tests * Still fixing tests * Broken content saving * Added node * Fixed screenshot tests * Fixing more tests * Fixed tests * Implemented local_storage.py * Cleaned up flush_cache * Fixing more tests * Wrote test for LocalArrowProvider * Introduced tests for local_storage_provider.py * Asserting test dir is empty * Creating subfolder for different aggregators * New depencies and init() * Everything is terribly broken * Figured out finalize_visit_id * Running two event loops kinda works??? * Rearming the event * Introduced mypy * Downgraded black in pre-commit * Modifying the database directly * Fixed formatting * Made mypy a lil stricter * Fixing docs and config printing * Realising I've been using the wrong with * Trying to figure arrow_storage * Moving lock initialization in in_memory_storage * Fixing tests * Fixing up tests and adding more typechecking * Fixed num_browsers in test_cache_hits_recorded * Parametrized unstructured * String fix * Added failing test * New test * Review changes with Steven * Fixed repin.sh and test_arrow_cache * Minor change * Fixed prune-environment.py * Removing references to DataAggregator * Fixed test_seed_persistance * More paths * Fixed test display shutdown * Made cache test more robust * Update crawler.py Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Slimming down ManagerParams * Fixing more tests * Update test/storage/test_storage_controller.py Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Purging references to DataAggregator * Reverted changes to .travis.yml * Demo.py saves locally again * Readjusting test paths * Expanded comment on initialize to reference #846 * Made token optional in finalize_visit_id * Simplified test paramtetrization * Fixed callback semantics change * Removed test_parse_http_stack_trace_str * Added DataSocket * WIP need to fix path encoding * Fixed path encoding * Added task and crawl to schema * Fixed paths in GitHub actions * Refactored completion handling * Fix tests * Trying to fix tests on CI * Removed redundant setting of tag * Removing references to S3 * Purging more DataAggregator references * Craking up logging to figure out test failure * Moved test_values into a fixture * Fixing GcpUnstructuredProvider * Fixed paths for future crawls * Renamed sqllite to official sqlite * Restored demo.py * Update openwpm/commands/profile_commands.py Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Restored previous behaviour of DumpProfileCommand Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Removed leftovers * Cleaned up comments * Expanded lock check * Fixed more stuff * More comment updates * Update openwpm/socket_interface.py Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Removed outdated comment * Using config_encoder * Renamed tar_location to tar_path * Removed references to database_name in docs * Cleanup * Moved screenshot_path and source_dump_path to ManagerParamsInternal * Fixed imports * Fixing up comments * Fixing up comments * More docs * updated dependencies * Fixed test_task_manager * Reupgraded to python 3.9.1 * Restoring crawl_reference in mp_logger * Removed unused imports * Apply suggestions from code review Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Cleaned up socket handling * Fixed TaskManager.__exit__ * Moved validation code into config.py * Removed comment * Removed comment * Removed comment Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com>
2021-02-22 19:51:32 +03:00
def get_config(
self, data_dir: Optional[Path] = None
) -> Tuple[ManagerParams, List[BrowserParams]]:
manager_params, browser_params = self.get_test_config(data_dir)
Refactoring browser and manager params into dataclasses (#807) * initial file commit * add new dependency for dataclasses * implemeted basic BrowserParams dataclass * dependencies update * file reformat * implemented basic ManagerParams dataclass * Update environment dependencies * Added new error class to validate browser and manager params * file reformat * Update scripts/environment-unpinned.yaml Co-authored-by: Stefan Zabka <zabkaste@informatik.hu-berlin.de> * added validations for BrowserParams dataclass * Update openwpm/config.py Co-authored-by: Stefan Zabka <zabkaste@informatik.hu-berlin.de> * Removed unnecessary checks Co-authored-by: Stefan Zabka <zabkaste@informatik.hu-berlin.de> * Changed error string formatting Co-authored-by: Stefan Zabka <zabkaste@informatik.hu-berlin.de> * Changed filenamea and necessary imports to resolve conflicts with new master branch(refering to PEP-8 reformatting) * Revert "Changed filenamea and necessary imports to resolve conflicts with new master branch(refering to PEP-8 reformatting)" This reverts commit e550c3bd604f415272bd05ee3d9c76397ad98006. * Revert "Merge branch 'master' into turn_browser_and_manager_params_into_dataclasses" This reverts commit aff5a384e737477746d6a38d3b2be6244f8dfd11, reversing changes made to 6ecaf5d0a94d376126692c3785692ba10626d88a. * Revert "Update environment dependencies" This reverts commit 385825b10aee4610a6e304122bec4ab2b7219a5b. * Revert "Merge branch 'turn_browser_and_manager_params_into_dataclasses' of https://github.com/ankushduacodes/OpenWPM into turn_browser_and_manager_params_into_dataclasses" This reverts commit 6ecaf5d0a94d376126692c3785692ba10626d88a, reversing changes made to e550c3bd604f415272bd05ee3d9c76397ad98006. * file reformat * finalized validate_browser_params function * fixed typo in error string * added validations for manager_params * Explanation for using list for supported browser * Revert "Revert "Merge branch 'master' into turn_browser_and_manager_params_into_dataclasses"" This reverts commit 6c3e98e57bd9c42acd029c74649742dcc81de86c. * Revert "Revert "Changed filenamea and necessary imports to resolve conflicts with new master branch(refering to PEP-8 reformatting)"" This reverts commit fc8f48f1878ea7c43b342989ce581dc3d6eab929. * import name change from .Error to .error * moved call_instrument check to config.py * fixed accidental use of dict syntax in a class * moved save_content check from deploy_firefox.py * deleting redundent file * deleted more redundent files * removed redundant imports * added new save_content check * property name changevariables can not have '-' * added new attribute to ManagerParams * adapted files to validate manager & broswer params - also added logic to convert the objects(BrowserParams and ManagerParams) to dictionaries to not break the functionality - also updated demo.py to work with new file names on this branch * removed obsolete documentaion * Dependency Update * Revert "Dependency Update" This reverts commit 8ee3a02b1764883a1f5922e0b52e9f17f8e098db. * Dependencies Update * unset memory and process watchdogs * add new output_format and failure_limit checks * inheriting dataclasses and added type hints to fn * added todo * fixed inheritance of dataclasses acc. to plan * refactor use of dict to use dataclasses(pending) * more refactoring use of dict to dataclasses - Also changed some type hints related to new refactoring * fixed screenshot directory issue - because of which some of the tests were failing * added try-except clause for unexpected errors * added tests to cover dataclasses * added some new and edited some old docs * refactor use of __dict__ to dataclass.to_dict() * Revert "refactor use of __dict__ to dataclass.to_dict()" This reverts commit a4f35513fa26d23a073c16af9fb332045826dcb2. * fixed some tests * refactor use of __dict__ in favor of dataclass.to_dict() method * removed some TODOS * fixed dataclases validation tests * Update docs/Configuration.md Co-authored-by: Stefan Zabka <zabkaste@informatik.hu-berlin.de> * Update docs/Configuration.md Co-authored-by: Stefan Zabka <zabkaste@informatik.hu-berlin.de> * Update docs/Configuration.md Co-authored-by: Stefan Zabka <zabkaste@informatik.hu-berlin.de> * Update openwpm/config.py Co-authored-by: Stefan Zabka <zabkaste@informatik.hu-berlin.de> * Update openwpm/config.py Co-authored-by: Stefan Zabka <zabkaste@informatik.hu-berlin.de> * Update openwpm/task_manager.py Co-authored-by: Stefan Zabka <zabkaste@informatik.hu-berlin.de> * minor fixed wrt polishing the PR * added new check and test for crawl configs Co-authored-by: Stefan Zabka <zabkaste@informatik.hu-berlin.de>
2020-12-02 12:10:45 +03:00
browser_params[0].http_instrument = True
return manager_params, browser_params
def get_post_requests_from_db(self, db):
"""Query the crawl database and return the POST requests."""
2020-09-11 16:14:09 +03:00
return db_utils.query_db(
db,
"SELECT * FROM http_requests\
WHERE method = 'POST'",
)
def get_post_request_body_from_db(self, db, raw=False):
"""Return the body of the first POST request in crawl db."""
posts = self.get_post_requests_from_db(db)
if raw:
2020-09-11 16:14:09 +03:00
return base64.b64decode(json.loads(posts[0]["post_body_raw"])[0][1])
else:
2020-09-11 16:14:09 +03:00
return posts[0]["post_body"]
def test_record_post_data_x_www_form_urlencoded(self):
encoding_type = "application/x-www-form-urlencoded"
db = self.visit("/post_request.html?encoding_type=" + encoding_type)
post_body = self.get_post_request_body_from_db(db)
2019-02-26 16:48:54 +03:00
assert json.loads(post_body) == self.post_data_multiline_json
def test_record_post_data_text_plain(self):
encoding_type = "text/plain"
2020-09-11 16:14:09 +03:00
db = self.visit("/post_request.html?encoding_type=" + encoding_type)
post_body = self.get_post_request_body_from_db(db, True)
2019-11-20 16:26:15 +03:00
if not isinstance(self.post_data_multiline_raw, str):
2020-09-11 16:14:09 +03:00
expected = self.post_data_multiline_raw.decode("utf-8")
else:
expected = self.post_data_multiline_raw
2020-09-11 16:14:09 +03:00
assert post_body.decode("utf8") == expected
def test_record_post_data_multipart_formdata(self):
encoding_type = "multipart/form-data"
2020-09-11 16:14:09 +03:00
db = self.visit("/post_request.html?encoding_type=" + encoding_type)
post_body = self.get_post_request_body_from_db(db)
2019-02-26 16:48:54 +03:00
assert json.loads(post_body) == self.post_data_multiline_json
post_row = self.get_post_requests_from_db(db)[0]
2020-09-11 16:14:09 +03:00
headers = post_row["headers"]
# make sure the "request headers from upload stream" are stored in db
assert "Content-Type" in headers
assert encoding_type in headers
2020-09-11 16:14:09 +03:00
assert "Content-Length" in post_row["headers"]
def test_record_post_data_ajax(self, tmpdir):
post_format = "object"
db = self.visit("/post_request_ajax.html?format=" + post_format)
post_body = self.get_post_request_body_from_db(db)
2019-02-26 16:48:54 +03:00
assert json.loads(post_body) == self.post_data_json
def test_record_post_data_ajax_no_key_value(self):
"""Test AJAX payloads that are not in the key=value form."""
post_format = "noKeyValue"
db = self.visit("/post_request_ajax.html?format=" + post_format)
post_body = self.get_post_request_body_from_db(db, True)
2020-09-11 16:14:09 +03:00
assert post_body.decode("utf8") == "test@example.com + name surname"
def test_record_post_data_ajax_no_key_value_base64_encoded(self):
"""Test Base64 encoded AJAX payloads (no key=value form)."""
post_format = "noKeyValueBase64"
db = self.visit("/post_request_ajax.html?format=" + post_format)
post_body = self.get_post_request_body_from_db(db, True)
2020-09-11 16:14:09 +03:00
assert post_body.decode("utf8") == (
"dGVzdEBleGFtcGxlLmNvbSArIG5hbWUgc3VybmFtZQ=="
)
def test_record_post_formdata(self):
post_format = "formData"
db = self.visit("/post_request_ajax.html?format=" + post_format)
post_body = self.get_post_request_body_from_db(db)
assert json.loads(post_body) == self.post_data_json
def test_record_binary_post_data(self):
post_format = "binary"
db = self.visit("/post_request_ajax.html?format=" + post_format)
post_body = self.get_post_request_body_from_db(db, True)
# Binary strings get put into the database as-if they were latin-1.
2019-11-20 16:26:15 +03:00
assert bytes(bytearray(range(100))) == post_body
2020-09-11 16:14:09 +03:00
@pytest.mark.skip(
reason="Firefox is currently not able to return the "
"file content for an upload, only the filename"
)
Data Aggregator Rewrite (#753) * First steps in the rewrite * Fixed import paths * One giant refactor * Fixing tests * Adding mypy * Removed mypy from pre-commit workflow * First draft on DataAggregator * Wrote a DataAggregator that starts and shuts down * Created tests and added more empty types * Got demo.py working * Created sql_provider * Cleaned up imports in TaskManager * Added async * Fixed minor bugs * First steps at porting arrow * Introduced TableName and different Task handling * Added more failing tests * First first completes others don't * It works * Started working on arrow_provider * Implemented ArrowProvider * Added logger fixture * Fixed test_storage_controller * Fixing OpenWPMTest.visit() * Moved test/storage_providers to test/storage * Fixing up tests * Moved automation to openwpm * Readded datadir to .gitignore * Ran repin.sh * Fixed formatting * Let's see if this works * Fixed imports * Got arrow_memory_provider working * Starting to rewrite tests * Setting up fixtures * Attempting to fix all the tests * Still fixing tests * Broken content saving * Added node * Fixed screenshot tests * Fixing more tests * Fixed tests * Implemented local_storage.py * Cleaned up flush_cache * Fixing more tests * Wrote test for LocalArrowProvider * Introduced tests for local_storage_provider.py * Asserting test dir is empty * Creating subfolder for different aggregators * New depencies and init() * Everything is terribly broken * Figured out finalize_visit_id * Running two event loops kinda works??? * Rearming the event * Introduced mypy * Downgraded black in pre-commit * Modifying the database directly * Fixed formatting * Made mypy a lil stricter * Fixing docs and config printing * Realising I've been using the wrong with * Trying to figure arrow_storage * Moving lock initialization in in_memory_storage * Fixing tests * Fixing up tests and adding more typechecking * Fixed num_browsers in test_cache_hits_recorded * Parametrized unstructured * String fix * Added failing test * New test * Review changes with Steven * Fixed repin.sh and test_arrow_cache * Minor change * Fixed prune-environment.py * Removing references to DataAggregator * Fixed test_seed_persistance * More paths * Fixed test display shutdown * Made cache test more robust * Update crawler.py Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Slimming down ManagerParams * Fixing more tests * Update test/storage/test_storage_controller.py Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Purging references to DataAggregator * Reverted changes to .travis.yml * Demo.py saves locally again * Readjusting test paths * Expanded comment on initialize to reference #846 * Made token optional in finalize_visit_id * Simplified test paramtetrization * Fixed callback semantics change * Removed test_parse_http_stack_trace_str * Added DataSocket * WIP need to fix path encoding * Fixed path encoding * Added task and crawl to schema * Fixed paths in GitHub actions * Refactored completion handling * Fix tests * Trying to fix tests on CI * Removed redundant setting of tag * Removing references to S3 * Purging more DataAggregator references * Craking up logging to figure out test failure * Moved test_values into a fixture * Fixing GcpUnstructuredProvider * Fixed paths for future crawls * Renamed sqllite to official sqlite * Restored demo.py * Update openwpm/commands/profile_commands.py Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Restored previous behaviour of DumpProfileCommand Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Removed leftovers * Cleaned up comments * Expanded lock check * Fixed more stuff * More comment updates * Update openwpm/socket_interface.py Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Removed outdated comment * Using config_encoder * Renamed tar_location to tar_path * Removed references to database_name in docs * Cleanup * Moved screenshot_path and source_dump_path to ManagerParamsInternal * Fixed imports * Fixing up comments * Fixing up comments * More docs * updated dependencies * Fixed test_task_manager * Reupgraded to python 3.9.1 * Restoring crawl_reference in mp_logger * Removed unused imports * Apply suggestions from code review Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Cleaned up socket handling * Fixed TaskManager.__exit__ * Moved validation code into config.py * Removed comment * Removed comment * Removed comment Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com>
2021-02-22 19:51:32 +03:00
def test_record_file_upload(self, task_manager_creator):
"""Test that we correctly capture the uploaded file contents.
We upload a CSS file and a PNG file to test both text based and
binary files.
File uploads are not expected in the crawl data, but we make sure we
correctly parse the POST data in this very common scenario.
Firefox is currently not able to return the FormData with the file
contents, currently only the filenames are returned. This is due to
a limitation in the current API implementation:
https://searchfox.org/mozilla-central/rev/b3b401254229f0a26f7ee625ef5f09c6c31e3949/toolkit/components/extensions/webrequest/WebRequestUpload.jsm#339
Therefore, the test is currently skipped.
"""
img_file_path = os.path.abspath("test_pages/shared/test_image.png")
css_file_path = os.path.abspath("test_pages/shared/test_style.css")
manager_params, browser_params = self.get_config()
Data Aggregator Rewrite (#753) * First steps in the rewrite * Fixed import paths * One giant refactor * Fixing tests * Adding mypy * Removed mypy from pre-commit workflow * First draft on DataAggregator * Wrote a DataAggregator that starts and shuts down * Created tests and added more empty types * Got demo.py working * Created sql_provider * Cleaned up imports in TaskManager * Added async * Fixed minor bugs * First steps at porting arrow * Introduced TableName and different Task handling * Added more failing tests * First first completes others don't * It works * Started working on arrow_provider * Implemented ArrowProvider * Added logger fixture * Fixed test_storage_controller * Fixing OpenWPMTest.visit() * Moved test/storage_providers to test/storage * Fixing up tests * Moved automation to openwpm * Readded datadir to .gitignore * Ran repin.sh * Fixed formatting * Let's see if this works * Fixed imports * Got arrow_memory_provider working * Starting to rewrite tests * Setting up fixtures * Attempting to fix all the tests * Still fixing tests * Broken content saving * Added node * Fixed screenshot tests * Fixing more tests * Fixed tests * Implemented local_storage.py * Cleaned up flush_cache * Fixing more tests * Wrote test for LocalArrowProvider * Introduced tests for local_storage_provider.py * Asserting test dir is empty * Creating subfolder for different aggregators * New depencies and init() * Everything is terribly broken * Figured out finalize_visit_id * Running two event loops kinda works??? * Rearming the event * Introduced mypy * Downgraded black in pre-commit * Modifying the database directly * Fixed formatting * Made mypy a lil stricter * Fixing docs and config printing * Realising I've been using the wrong with * Trying to figure arrow_storage * Moving lock initialization in in_memory_storage * Fixing tests * Fixing up tests and adding more typechecking * Fixed num_browsers in test_cache_hits_recorded * Parametrized unstructured * String fix * Added failing test * New test * Review changes with Steven * Fixed repin.sh and test_arrow_cache * Minor change * Fixed prune-environment.py * Removing references to DataAggregator * Fixed test_seed_persistance * More paths * Fixed test display shutdown * Made cache test more robust * Update crawler.py Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Slimming down ManagerParams * Fixing more tests * Update test/storage/test_storage_controller.py Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Purging references to DataAggregator * Reverted changes to .travis.yml * Demo.py saves locally again * Readjusting test paths * Expanded comment on initialize to reference #846 * Made token optional in finalize_visit_id * Simplified test paramtetrization * Fixed callback semantics change * Removed test_parse_http_stack_trace_str * Added DataSocket * WIP need to fix path encoding * Fixed path encoding * Added task and crawl to schema * Fixed paths in GitHub actions * Refactored completion handling * Fix tests * Trying to fix tests on CI * Removed redundant setting of tag * Removing references to S3 * Purging more DataAggregator references * Craking up logging to figure out test failure * Moved test_values into a fixture * Fixing GcpUnstructuredProvider * Fixed paths for future crawls * Renamed sqllite to official sqlite * Restored demo.py * Update openwpm/commands/profile_commands.py Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Restored previous behaviour of DumpProfileCommand Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Removed leftovers * Cleaned up comments * Expanded lock check * Fixed more stuff * More comment updates * Update openwpm/socket_interface.py Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Removed outdated comment * Using config_encoder * Renamed tar_location to tar_path * Removed references to database_name in docs * Cleanup * Moved screenshot_path and source_dump_path to ManagerParamsInternal * Fixed imports * Fixing up comments * Fixing up comments * More docs * updated dependencies * Fixed test_task_manager * Reupgraded to python 3.9.1 * Restoring crawl_reference in mp_logger * Removed unused imports * Apply suggestions from code review Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Cleaned up socket handling * Fixed TaskManager.__exit__ * Moved validation code into config.py * Removed comment * Removed comment * Removed comment Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com>
2021-02-22 19:51:32 +03:00
manager, db_path = task_manager_creator((manager_params, browser_params))
test_url = utilities.BASE_TEST_URL + "/post_file_upload.html"
cs = command_sequence.CommandSequence(test_url)
cs.get(sleep=0, timeout=60)
Command refactoring (#750) * Refactored GetCommand, BrowseCommand to have execute method * Fixed type name format issues in __issue_command * Fixed everything I broke * Changed import style so tests can run * Added BrowseCommad to imports * Added some more self * Added logging to explain failing test * Added one more self * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * Ported SaveScreenshotCommand It now uses the new command.execute(...) syntax * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * Ported SaveScreenshotFullPage #763 * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * Ported DumpPageSource and RecursiveDumpPageSource (#767) * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Command refactoring (#770) * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Ran isort * Added append_command * remove custom function command and format code * Refactored GetCommand, BrowseCommand to have execute method * Fixed type name format issues in __issue_command * Fixed everything I broke * Changed import style so tests can run * Added BrowseCommad to imports * Added some more self * Added logging to explain failing test * Added one more self * Ported SaveScreenshotCommand It now uses the new command.execute(...) syntax * Ported SaveScreenshotFullPage #763 * Ported DumpPageSource and RecursiveDumpPageSource (#767) * Command refactoring (#770) * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Ran isort * Added append_command * remove duplicate append_command * Refactored GetCommand, BrowseCommand to have execute method * Fixed type name format issues in __issue_command * Fixed everything I broke * Changed import style so tests can run * Added BrowseCommad to imports * Added some more self * Added logging to explain failing test * Added one more self * Ported SaveScreenshotCommand It now uses the new command.execute(...) syntax * Ported SaveScreenshotFullPage #763 * Ported DumpPageSource and RecursiveDumpPageSource (#767) * Command refactoring (#770) * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Ran isort * Added append_command * generate new xpi * Fixing tests * Fixing tests * Fixing up more tests * Removed type annotations * Fixing tests * Fixing tests * Removed command_executor * Moved Commands to commands * Fixing imports * Fixed skipped test * Removed duplicate append_command * docs: update adding command in usingOpenWPM * Forgot to save * Removed datadir * Cleaning up imports * Implemented simple command * Added documentation to simple_command.py * Renamed to custom_command.py * Moved docs around * Referencing BaseCommand.execute * Update docs/Using_OpenWPM.md Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> Co-authored-by: Cyrus <cyruskarsan@gmail.com> Co-authored-by: cyruskarsan <55566678+cyruskarsan@users.noreply.github.com> Co-authored-by: Steven Englehardt <senglehardt@mozilla.com>
2021-01-09 13:15:01 +03:00
cs.append_command(FilenamesIntoFormCommand(img_file_path, css_file_path))
manager.execute_command_sequence(cs)
manager.close()
Data Aggregator Rewrite (#753) * First steps in the rewrite * Fixed import paths * One giant refactor * Fixing tests * Adding mypy * Removed mypy from pre-commit workflow * First draft on DataAggregator * Wrote a DataAggregator that starts and shuts down * Created tests and added more empty types * Got demo.py working * Created sql_provider * Cleaned up imports in TaskManager * Added async * Fixed minor bugs * First steps at porting arrow * Introduced TableName and different Task handling * Added more failing tests * First first completes others don't * It works * Started working on arrow_provider * Implemented ArrowProvider * Added logger fixture * Fixed test_storage_controller * Fixing OpenWPMTest.visit() * Moved test/storage_providers to test/storage * Fixing up tests * Moved automation to openwpm * Readded datadir to .gitignore * Ran repin.sh * Fixed formatting * Let's see if this works * Fixed imports * Got arrow_memory_provider working * Starting to rewrite tests * Setting up fixtures * Attempting to fix all the tests * Still fixing tests * Broken content saving * Added node * Fixed screenshot tests * Fixing more tests * Fixed tests * Implemented local_storage.py * Cleaned up flush_cache * Fixing more tests * Wrote test for LocalArrowProvider * Introduced tests for local_storage_provider.py * Asserting test dir is empty * Creating subfolder for different aggregators * New depencies and init() * Everything is terribly broken * Figured out finalize_visit_id * Running two event loops kinda works??? * Rearming the event * Introduced mypy * Downgraded black in pre-commit * Modifying the database directly * Fixed formatting * Made mypy a lil stricter * Fixing docs and config printing * Realising I've been using the wrong with * Trying to figure arrow_storage * Moving lock initialization in in_memory_storage * Fixing tests * Fixing up tests and adding more typechecking * Fixed num_browsers in test_cache_hits_recorded * Parametrized unstructured * String fix * Added failing test * New test * Review changes with Steven * Fixed repin.sh and test_arrow_cache * Minor change * Fixed prune-environment.py * Removing references to DataAggregator * Fixed test_seed_persistance * More paths * Fixed test display shutdown * Made cache test more robust * Update crawler.py Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Slimming down ManagerParams * Fixing more tests * Update test/storage/test_storage_controller.py Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Purging references to DataAggregator * Reverted changes to .travis.yml * Demo.py saves locally again * Readjusting test paths * Expanded comment on initialize to reference #846 * Made token optional in finalize_visit_id * Simplified test paramtetrization * Fixed callback semantics change * Removed test_parse_http_stack_trace_str * Added DataSocket * WIP need to fix path encoding * Fixed path encoding * Added task and crawl to schema * Fixed paths in GitHub actions * Refactored completion handling * Fix tests * Trying to fix tests on CI * Removed redundant setting of tag * Removing references to S3 * Purging more DataAggregator references * Craking up logging to figure out test failure * Moved test_values into a fixture * Fixing GcpUnstructuredProvider * Fixed paths for future crawls * Renamed sqllite to official sqlite * Restored demo.py * Update openwpm/commands/profile_commands.py Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Restored previous behaviour of DumpProfileCommand Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Removed leftovers * Cleaned up comments * Expanded lock check * Fixed more stuff * More comment updates * Update openwpm/socket_interface.py Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Removed outdated comment * Using config_encoder * Renamed tar_location to tar_path * Removed references to database_name in docs * Cleanup * Moved screenshot_path and source_dump_path to ManagerParamsInternal * Fixed imports * Fixing up comments * Fixing up comments * More docs * updated dependencies * Fixed test_task_manager * Reupgraded to python 3.9.1 * Restoring crawl_reference in mp_logger * Removed unused imports * Apply suggestions from code review Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Cleaned up socket handling * Fixed TaskManager.__exit__ * Moved validation code into config.py * Removed comment * Removed comment * Removed comment Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com>
2021-02-22 19:51:32 +03:00
post_body = self.get_post_request_body_from_db(db_path)
# Binary strings get put into the database as-if they were latin-1.
2020-09-11 16:14:09 +03:00
with open(img_file_path, "rb") as f:
img_file_content = f.read().strip().decode("latin-1")
with open(css_file_path, "rt") as f:
css_file_content = f.read().strip()
# POST data is stored as JSON in the DB
post_body_decoded = json.loads(post_body)
2020-09-11 16:14:09 +03:00
expected_body = {
"username": "name surname+",
"upload-css": css_file_content,
"upload-img": img_file_content,
2020-09-11 16:14:09 +03:00
}
assert expected_body == post_body_decoded
Command refactoring (#750) * Refactored GetCommand, BrowseCommand to have execute method * Fixed type name format issues in __issue_command * Fixed everything I broke * Changed import style so tests can run * Added BrowseCommad to imports * Added some more self * Added logging to explain failing test * Added one more self * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * Ported SaveScreenshotCommand It now uses the new command.execute(...) syntax * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * Ported SaveScreenshotFullPage #763 * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * Ported DumpPageSource and RecursiveDumpPageSource (#767) * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Command refactoring (#770) * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Ran isort * Added append_command * remove custom function command and format code * Refactored GetCommand, BrowseCommand to have execute method * Fixed type name format issues in __issue_command * Fixed everything I broke * Changed import style so tests can run * Added BrowseCommad to imports * Added some more self * Added logging to explain failing test * Added one more self * Ported SaveScreenshotCommand It now uses the new command.execute(...) syntax * Ported SaveScreenshotFullPage #763 * Ported DumpPageSource and RecursiveDumpPageSource (#767) * Command refactoring (#770) * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Ran isort * Added append_command * remove duplicate append_command * Refactored GetCommand, BrowseCommand to have execute method * Fixed type name format issues in __issue_command * Fixed everything I broke * Changed import style so tests can run * Added BrowseCommad to imports * Added some more self * Added logging to explain failing test * Added one more self * Ported SaveScreenshotCommand It now uses the new command.execute(...) syntax * Ported SaveScreenshotFullPage #763 * Ported DumpPageSource and RecursiveDumpPageSource (#767) * Command refactoring (#770) * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Ran isort * Added append_command * generate new xpi * Fixing tests * Fixing tests * Fixing up more tests * Removed type annotations * Fixing tests * Fixing tests * Removed command_executor * Moved Commands to commands * Fixing imports * Fixed skipped test * Removed duplicate append_command * docs: update adding command in usingOpenWPM * Forgot to save * Removed datadir * Cleaning up imports * Implemented simple command * Added documentation to simple_command.py * Renamed to custom_command.py * Moved docs around * Referencing BaseCommand.execute * Update docs/Using_OpenWPM.md Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> Co-authored-by: Cyrus <cyruskarsan@gmail.com> Co-authored-by: cyruskarsan <55566678+cyruskarsan@users.noreply.github.com> Co-authored-by: Steven Englehardt <senglehardt@mozilla.com>
2021-01-09 13:15:01 +03:00
@pytest.mark.parametrize("delayed", [True, False])
def test_page_visit(task_manager_creator, http_params, delayed):
test_url = utilities.BASE_TEST_URL + "/http_test_page.html"
manager_params, browser_params = http_params()
if delayed:
for browser_param in browser_params:
browser_param.custom_params[
"pre_instrumentation_code"
] = """
const startTime = Date.now();
while (Date.now() - startTime < 5000) { // Delaying for 5s
console.log("delaying startup");
};
"""
tm, db = task_manager_creator((manager_params, browser_params))
with tm as tm:
tm.get(test_url)
request_id_to_url = dict()
# HTTP Requests
rows = db_utils.query_db(db, "SELECT * FROM http_requests")
observed_records = set()
for row in rows:
observed_records.add(
(
row["url"].split("?")[0],
row["top_level_url"],
row["triggering_origin"],
row["loading_origin"],
row["loading_href"],
row["is_XHR"],
row["is_third_party_channel"],
row["is_third_party_to_top_window"],
row["resource_type"],
)
)
request_id_to_url[row["request_id"]] = row["url"]
assert HTTP_REQUESTS == observed_records
# HTTP Responses
rows = db_utils.query_db(db, "SELECT * FROM http_responses")
observed_records: Set[Tuple[str, str]] = set()
for row in rows:
observed_records.add(
(
row["url"].split("?")[0],
# TODO: webext-instrumentation doesn't support referrer
# yet | row['referrer'],
row["location"],
)
)
assert row["request_id"] in request_id_to_url
assert request_id_to_url[row["request_id"]] == row["url"]
assert HTTP_RESPONSES == observed_records
# HTTP Redirects
rows = db_utils.query_db(db, "SELECT * FROM http_redirects")
observed_records = set()
for row in rows:
# TODO: webext instrumentation doesn't support new_request_id yet
# src = request_id_to_url[row['old_request_id']].split('?')[0]
# dst = request_id_to_url[row['new_request_id']].split('?')[0]
src = row["old_request_url"].split("?")[0]
dst = row["new_request_url"].split("?")[0]
headers = json.loads(row["headers"])
location = None
for header, value in headers:
if header.lower() == "location":
location = value
break
observed_records.add((src, dst, location))
assert HTTP_REDIRECTS == observed_records
Data Aggregator Rewrite (#753) * First steps in the rewrite * Fixed import paths * One giant refactor * Fixing tests * Adding mypy * Removed mypy from pre-commit workflow * First draft on DataAggregator * Wrote a DataAggregator that starts and shuts down * Created tests and added more empty types * Got demo.py working * Created sql_provider * Cleaned up imports in TaskManager * Added async * Fixed minor bugs * First steps at porting arrow * Introduced TableName and different Task handling * Added more failing tests * First first completes others don't * It works * Started working on arrow_provider * Implemented ArrowProvider * Added logger fixture * Fixed test_storage_controller * Fixing OpenWPMTest.visit() * Moved test/storage_providers to test/storage * Fixing up tests * Moved automation to openwpm * Readded datadir to .gitignore * Ran repin.sh * Fixed formatting * Let's see if this works * Fixed imports * Got arrow_memory_provider working * Starting to rewrite tests * Setting up fixtures * Attempting to fix all the tests * Still fixing tests * Broken content saving * Added node * Fixed screenshot tests * Fixing more tests * Fixed tests * Implemented local_storage.py * Cleaned up flush_cache * Fixing more tests * Wrote test for LocalArrowProvider * Introduced tests for local_storage_provider.py * Asserting test dir is empty * Creating subfolder for different aggregators * New depencies and init() * Everything is terribly broken * Figured out finalize_visit_id * Running two event loops kinda works??? * Rearming the event * Introduced mypy * Downgraded black in pre-commit * Modifying the database directly * Fixed formatting * Made mypy a lil stricter * Fixing docs and config printing * Realising I've been using the wrong with * Trying to figure arrow_storage * Moving lock initialization in in_memory_storage * Fixing tests * Fixing up tests and adding more typechecking * Fixed num_browsers in test_cache_hits_recorded * Parametrized unstructured * String fix * Added failing test * New test * Review changes with Steven * Fixed repin.sh and test_arrow_cache * Minor change * Fixed prune-environment.py * Removing references to DataAggregator * Fixed test_seed_persistance * More paths * Fixed test display shutdown * Made cache test more robust * Update crawler.py Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Slimming down ManagerParams * Fixing more tests * Update test/storage/test_storage_controller.py Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Purging references to DataAggregator * Reverted changes to .travis.yml * Demo.py saves locally again * Readjusting test paths * Expanded comment on initialize to reference #846 * Made token optional in finalize_visit_id * Simplified test paramtetrization * Fixed callback semantics change * Removed test_parse_http_stack_trace_str * Added DataSocket * WIP need to fix path encoding * Fixed path encoding * Added task and crawl to schema * Fixed paths in GitHub actions * Refactored completion handling * Fix tests * Trying to fix tests on CI * Removed redundant setting of tag * Removing references to S3 * Purging more DataAggregator references * Craking up logging to figure out test failure * Moved test_values into a fixture * Fixing GcpUnstructuredProvider * Fixed paths for future crawls * Renamed sqllite to official sqlite * Restored demo.py * Update openwpm/commands/profile_commands.py Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Restored previous behaviour of DumpProfileCommand Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Removed leftovers * Cleaned up comments * Expanded lock check * Fixed more stuff * More comment updates * Update openwpm/socket_interface.py Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Removed outdated comment * Using config_encoder * Renamed tar_location to tar_path * Removed references to database_name in docs * Cleanup * Moved screenshot_path and source_dump_path to ManagerParamsInternal * Fixed imports * Fixing up comments * Fixing up comments * More docs * updated dependencies * Fixed test_task_manager * Reupgraded to python 3.9.1 * Restoring crawl_reference in mp_logger * Removed unused imports * Apply suggestions from code review Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Cleaned up socket handling * Fixed TaskManager.__exit__ * Moved validation code into config.py * Removed comment * Removed comment * Removed comment Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com>
2021-02-22 19:51:32 +03:00
def test_javascript_saving(http_params, xpi, server):
"""check that javascript content is saved and hashed correctly"""
Data Aggregator Rewrite (#753) * First steps in the rewrite * Fixed import paths * One giant refactor * Fixing tests * Adding mypy * Removed mypy from pre-commit workflow * First draft on DataAggregator * Wrote a DataAggregator that starts and shuts down * Created tests and added more empty types * Got demo.py working * Created sql_provider * Cleaned up imports in TaskManager * Added async * Fixed minor bugs * First steps at porting arrow * Introduced TableName and different Task handling * Added more failing tests * First first completes others don't * It works * Started working on arrow_provider * Implemented ArrowProvider * Added logger fixture * Fixed test_storage_controller * Fixing OpenWPMTest.visit() * Moved test/storage_providers to test/storage * Fixing up tests * Moved automation to openwpm * Readded datadir to .gitignore * Ran repin.sh * Fixed formatting * Let's see if this works * Fixed imports * Got arrow_memory_provider working * Starting to rewrite tests * Setting up fixtures * Attempting to fix all the tests * Still fixing tests * Broken content saving * Added node * Fixed screenshot tests * Fixing more tests * Fixed tests * Implemented local_storage.py * Cleaned up flush_cache * Fixing more tests * Wrote test for LocalArrowProvider * Introduced tests for local_storage_provider.py * Asserting test dir is empty * Creating subfolder for different aggregators * New depencies and init() * Everything is terribly broken * Figured out finalize_visit_id * Running two event loops kinda works??? * Rearming the event * Introduced mypy * Downgraded black in pre-commit * Modifying the database directly * Fixed formatting * Made mypy a lil stricter * Fixing docs and config printing * Realising I've been using the wrong with * Trying to figure arrow_storage * Moving lock initialization in in_memory_storage * Fixing tests * Fixing up tests and adding more typechecking * Fixed num_browsers in test_cache_hits_recorded * Parametrized unstructured * String fix * Added failing test * New test * Review changes with Steven * Fixed repin.sh and test_arrow_cache * Minor change * Fixed prune-environment.py * Removing references to DataAggregator * Fixed test_seed_persistance * More paths * Fixed test display shutdown * Made cache test more robust * Update crawler.py Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Slimming down ManagerParams * Fixing more tests * Update test/storage/test_storage_controller.py Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Purging references to DataAggregator * Reverted changes to .travis.yml * Demo.py saves locally again * Readjusting test paths * Expanded comment on initialize to reference #846 * Made token optional in finalize_visit_id * Simplified test paramtetrization * Fixed callback semantics change * Removed test_parse_http_stack_trace_str * Added DataSocket * WIP need to fix path encoding * Fixed path encoding * Added task and crawl to schema * Fixed paths in GitHub actions * Refactored completion handling * Fix tests * Trying to fix tests on CI * Removed redundant setting of tag * Removing references to S3 * Purging more DataAggregator references * Craking up logging to figure out test failure * Moved test_values into a fixture * Fixing GcpUnstructuredProvider * Fixed paths for future crawls * Renamed sqllite to official sqlite * Restored demo.py * Update openwpm/commands/profile_commands.py Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Restored previous behaviour of DumpProfileCommand Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Removed leftovers * Cleaned up comments * Expanded lock check * Fixed more stuff * More comment updates * Update openwpm/socket_interface.py Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Removed outdated comment * Using config_encoder * Renamed tar_location to tar_path * Removed references to database_name in docs * Cleanup * Moved screenshot_path and source_dump_path to ManagerParamsInternal * Fixed imports * Fixing up comments * Fixing up comments * More docs * updated dependencies * Fixed test_task_manager * Reupgraded to python 3.9.1 * Restoring crawl_reference in mp_logger * Removed unused imports * Apply suggestions from code review Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Cleaned up socket handling * Fixed TaskManager.__exit__ * Moved validation code into config.py * Removed comment * Removed comment * Removed comment Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com>
2021-02-22 19:51:32 +03:00
test_url = utilities.BASE_TEST_URL + "/http_test_page.html"
manager_params, browser_params = http_params()
for browser_param in browser_params:
browser_param.http_instrument = True
browser_param.save_content = "script"
structured_storage = SQLiteStorageProvider(
db_path=manager_params.data_directory / "crawl-data.sqlite"
)
ldb_path = Path(manager_params.data_directory) / "content.ldb"
unstructured_storage = LevelDbProvider(db_path=ldb_path)
manager = task_manager.TaskManager(
manager_params, browser_params, structured_storage, unstructured_storage
)
manager.get(url=test_url, sleep=1)
manager.close()
expected_hashes = {
"0110c0521088c74f179615cd7c404816816126fa657550032f75ede67a66c7cc",
"b34744034cd61e139f85f6c4c92464927bed8343a7ac08acf9fb3c6796f80f08",
}
for chash, content in db_utils.get_content(ldb_path):
chash = chash.decode("ascii").lower()
pyhash = sha256(content).hexdigest().lower()
assert pyhash == chash # Verify expected key (sha256 of content)
assert chash in expected_hashes
expected_hashes.remove(chash)
assert len(expected_hashes) == 0 # All expected hashes have been seen
def test_document_saving(http_params, xpi, server):
"""check that document content is saved and hashed correctly"""
Data Aggregator Rewrite (#753) * First steps in the rewrite * Fixed import paths * One giant refactor * Fixing tests * Adding mypy * Removed mypy from pre-commit workflow * First draft on DataAggregator * Wrote a DataAggregator that starts and shuts down * Created tests and added more empty types * Got demo.py working * Created sql_provider * Cleaned up imports in TaskManager * Added async * Fixed minor bugs * First steps at porting arrow * Introduced TableName and different Task handling * Added more failing tests * First first completes others don't * It works * Started working on arrow_provider * Implemented ArrowProvider * Added logger fixture * Fixed test_storage_controller * Fixing OpenWPMTest.visit() * Moved test/storage_providers to test/storage * Fixing up tests * Moved automation to openwpm * Readded datadir to .gitignore * Ran repin.sh * Fixed formatting * Let's see if this works * Fixed imports * Got arrow_memory_provider working * Starting to rewrite tests * Setting up fixtures * Attempting to fix all the tests * Still fixing tests * Broken content saving * Added node * Fixed screenshot tests * Fixing more tests * Fixed tests * Implemented local_storage.py * Cleaned up flush_cache * Fixing more tests * Wrote test for LocalArrowProvider * Introduced tests for local_storage_provider.py * Asserting test dir is empty * Creating subfolder for different aggregators * New depencies and init() * Everything is terribly broken * Figured out finalize_visit_id * Running two event loops kinda works??? * Rearming the event * Introduced mypy * Downgraded black in pre-commit * Modifying the database directly * Fixed formatting * Made mypy a lil stricter * Fixing docs and config printing * Realising I've been using the wrong with * Trying to figure arrow_storage * Moving lock initialization in in_memory_storage * Fixing tests * Fixing up tests and adding more typechecking * Fixed num_browsers in test_cache_hits_recorded * Parametrized unstructured * String fix * Added failing test * New test * Review changes with Steven * Fixed repin.sh and test_arrow_cache * Minor change * Fixed prune-environment.py * Removing references to DataAggregator * Fixed test_seed_persistance * More paths * Fixed test display shutdown * Made cache test more robust * Update crawler.py Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Slimming down ManagerParams * Fixing more tests * Update test/storage/test_storage_controller.py Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Purging references to DataAggregator * Reverted changes to .travis.yml * Demo.py saves locally again * Readjusting test paths * Expanded comment on initialize to reference #846 * Made token optional in finalize_visit_id * Simplified test paramtetrization * Fixed callback semantics change * Removed test_parse_http_stack_trace_str * Added DataSocket * WIP need to fix path encoding * Fixed path encoding * Added task and crawl to schema * Fixed paths in GitHub actions * Refactored completion handling * Fix tests * Trying to fix tests on CI * Removed redundant setting of tag * Removing references to S3 * Purging more DataAggregator references * Craking up logging to figure out test failure * Moved test_values into a fixture * Fixing GcpUnstructuredProvider * Fixed paths for future crawls * Renamed sqllite to official sqlite * Restored demo.py * Update openwpm/commands/profile_commands.py Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Restored previous behaviour of DumpProfileCommand Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Removed leftovers * Cleaned up comments * Expanded lock check * Fixed more stuff * More comment updates * Update openwpm/socket_interface.py Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Removed outdated comment * Using config_encoder * Renamed tar_location to tar_path * Removed references to database_name in docs * Cleanup * Moved screenshot_path and source_dump_path to ManagerParamsInternal * Fixed imports * Fixing up comments * Fixing up comments * More docs * updated dependencies * Fixed test_task_manager * Reupgraded to python 3.9.1 * Restoring crawl_reference in mp_logger * Removed unused imports * Apply suggestions from code review Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Cleaned up socket handling * Fixed TaskManager.__exit__ * Moved validation code into config.py * Removed comment * Removed comment * Removed comment Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com>
2021-02-22 19:51:32 +03:00
test_url = utilities.BASE_TEST_URL + "/http_test_page.html"
expected_hashes = {
"2390eceab422db15bc45940b7e042e83e6cbd5f279f57e714bc4ad6cded7f966",
"25343f42d9ffa5c082745f775b172db87d6e14dfbc3160b48669e06d727bfc8d",
}
manager_params, browser_params = http_params()
for browser_param in browser_params:
browser_param.http_instrument = True
browser_param.save_content = "main_frame,sub_frame"
structured_storage = SQLiteStorageProvider(
db_path=manager_params.data_directory / "crawl-data.sqlite"
)
ldb_path = Path(manager_params.data_directory) / "content.ldb"
unstructured_storage = LevelDbProvider(db_path=ldb_path)
manager = task_manager.TaskManager(
manager_params, browser_params, structured_storage, unstructured_storage
)
manager.get(url=test_url, sleep=1)
manager.close()
for chash, content in db_utils.get_content(ldb_path):
chash = chash.decode("ascii").lower()
pyhash = sha256(content).hexdigest().lower()
assert pyhash == chash # Verify expected key (sha256 of content)
assert chash in expected_hashes
expected_hashes.remove(chash)
assert len(expected_hashes) == 0 # All expected hashes have been seen
def test_content_saving(http_params, xpi, server):
"""check that content is saved and hashed correctly"""
Data Aggregator Rewrite (#753) * First steps in the rewrite * Fixed import paths * One giant refactor * Fixing tests * Adding mypy * Removed mypy from pre-commit workflow * First draft on DataAggregator * Wrote a DataAggregator that starts and shuts down * Created tests and added more empty types * Got demo.py working * Created sql_provider * Cleaned up imports in TaskManager * Added async * Fixed minor bugs * First steps at porting arrow * Introduced TableName and different Task handling * Added more failing tests * First first completes others don't * It works * Started working on arrow_provider * Implemented ArrowProvider * Added logger fixture * Fixed test_storage_controller * Fixing OpenWPMTest.visit() * Moved test/storage_providers to test/storage * Fixing up tests * Moved automation to openwpm * Readded datadir to .gitignore * Ran repin.sh * Fixed formatting * Let's see if this works * Fixed imports * Got arrow_memory_provider working * Starting to rewrite tests * Setting up fixtures * Attempting to fix all the tests * Still fixing tests * Broken content saving * Added node * Fixed screenshot tests * Fixing more tests * Fixed tests * Implemented local_storage.py * Cleaned up flush_cache * Fixing more tests * Wrote test for LocalArrowProvider * Introduced tests for local_storage_provider.py * Asserting test dir is empty * Creating subfolder for different aggregators * New depencies and init() * Everything is terribly broken * Figured out finalize_visit_id * Running two event loops kinda works??? * Rearming the event * Introduced mypy * Downgraded black in pre-commit * Modifying the database directly * Fixed formatting * Made mypy a lil stricter * Fixing docs and config printing * Realising I've been using the wrong with * Trying to figure arrow_storage * Moving lock initialization in in_memory_storage * Fixing tests * Fixing up tests and adding more typechecking * Fixed num_browsers in test_cache_hits_recorded * Parametrized unstructured * String fix * Added failing test * New test * Review changes with Steven * Fixed repin.sh and test_arrow_cache * Minor change * Fixed prune-environment.py * Removing references to DataAggregator * Fixed test_seed_persistance * More paths * Fixed test display shutdown * Made cache test more robust * Update crawler.py Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Slimming down ManagerParams * Fixing more tests * Update test/storage/test_storage_controller.py Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Purging references to DataAggregator * Reverted changes to .travis.yml * Demo.py saves locally again * Readjusting test paths * Expanded comment on initialize to reference #846 * Made token optional in finalize_visit_id * Simplified test paramtetrization * Fixed callback semantics change * Removed test_parse_http_stack_trace_str * Added DataSocket * WIP need to fix path encoding * Fixed path encoding * Added task and crawl to schema * Fixed paths in GitHub actions * Refactored completion handling * Fix tests * Trying to fix tests on CI * Removed redundant setting of tag * Removing references to S3 * Purging more DataAggregator references * Craking up logging to figure out test failure * Moved test_values into a fixture * Fixing GcpUnstructuredProvider * Fixed paths for future crawls * Renamed sqllite to official sqlite * Restored demo.py * Update openwpm/commands/profile_commands.py Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Restored previous behaviour of DumpProfileCommand Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Removed leftovers * Cleaned up comments * Expanded lock check * Fixed more stuff * More comment updates * Update openwpm/socket_interface.py Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Removed outdated comment * Using config_encoder * Renamed tar_location to tar_path * Removed references to database_name in docs * Cleanup * Moved screenshot_path and source_dump_path to ManagerParamsInternal * Fixed imports * Fixing up comments * Fixing up comments * More docs * updated dependencies * Fixed test_task_manager * Reupgraded to python 3.9.1 * Restoring crawl_reference in mp_logger * Removed unused imports * Apply suggestions from code review Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Cleaned up socket handling * Fixed TaskManager.__exit__ * Moved validation code into config.py * Removed comment * Removed comment * Removed comment Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com>
2021-02-22 19:51:32 +03:00
test_url = utilities.BASE_TEST_URL + "/http_test_page.html"
manager_params, browser_params = http_params()
for browser_param in browser_params:
browser_param.http_instrument = True
browser_param.save_content = True
db = manager_params.data_directory / "crawl-data.sqlite"
structured_storage = SQLiteStorageProvider(db_path=db)
ldb_path = Path(manager_params.data_directory) / "content.ldb"
unstructured_storage = LevelDbProvider(db_path=ldb_path)
manager = task_manager.TaskManager(
manager_params, browser_params, structured_storage, unstructured_storage
)
manager.get(url=test_url, sleep=1)
manager.close()
rows = db_utils.query_db(db, "SELECT * FROM http_responses;")
disk_content = dict()
for row in rows:
if "MAGIC_REDIRECT" in row["url"] or "404" in row["url"]:
continue
path = urlparse(row["url"]).path
with open(os.path.join(BASE_PATH, path[1:]), "rb") as f:
content = f.read()
chash = sha256(content).hexdigest()
assert chash == row["content_hash"]
disk_content[chash] = content
ldb_content = dict()
for chash, content in db_utils.get_content(ldb_path):
chash = chash.decode("ascii")
ldb_content[chash] = content
for k, v in disk_content.items():
assert v == ldb_content[k]
def test_cache_hits_recorded(http_params, task_manager_creator):
"""Verify all http responses are recorded, including cached responses
Note that we expect to see all of the same requests and responses
during the second vist (even if cached) except for images. Cached
images do not trigger Observer Notification events.
See Bug 634073: https://bugzilla.mozilla.org/show_bug.cgi?id=634073
The test page includes an image which does several permanent redirects
before returning a 404. We expect to see new requests and responses
for this image when the page is reloaded. Additionally, the redirects
should be cached.
"""
test_url = utilities.BASE_TEST_URL + "/http_test_page.html"
manager_params, browser_params = http_params()
# ensuring that we only spawn one browser
manager_params.num_browsers = 1
manager, db = task_manager_creator((manager_params, [browser_params[0]]))
for i in range(2):
cs = CommandSequence(test_url, site_rank=i)
cs.get(sleep=5)
manager.execute_command_sequence(cs)
manager.close()
request_id_to_url = dict()
# HTTP Requests
rows = db_utils.query_db(
db,
"""
SELECT hr.*
FROM http_requests as hr
JOIN site_visits sv ON sv.visit_id = hr.visit_id and sv.browser_id = hr.browser_id
WHERE sv.site_rank = 1""",
)
observed_records = set()
for row in rows:
# HACK: favicon caching is unpredictable, don't bother checking it
if row["url"].split("?")[0].endswith("favicon.ico"):
continue
observed_records.add(
(
row["url"].split("?")[0],
row["top_level_url"],
row["triggering_origin"],
row["loading_origin"],
row["loading_href"],
row["is_XHR"],
row["is_third_party_channel"],
row["is_third_party_to_top_window"],
row["resource_type"],
)
)
request_id_to_url[row["request_id"]] = row["url"]
assert observed_records == HTTP_CACHED_REQUESTS
# HTTP Responses
rows = db_utils.query_db(
db,
"""
SELECT hp.*
FROM http_responses as hp
JOIN site_visits sv ON sv.visit_id = hp.visit_id and sv.browser_id = hp.browser_id
WHERE sv.site_rank = 1""",
)
observed_records = set()
for row in rows:
# HACK: favicon caching is unpredictable, don't bother checking it
if row["url"].split("?")[0].endswith("favicon.ico"):
continue
observed_records.add(
(
row["url"].split("?")[0],
# TODO: referrer isn't available yet in the
# webext instrumentation | row['referrer'],
row["is_cached"],
)
)
assert row["request_id"] in request_id_to_url
assert request_id_to_url[row["request_id"]] == row["url"]
assert HTTP_CACHED_RESPONSES == observed_records
# HTTP Redirects
rows = db_utils.query_db(
db,
"""
SELECT hr.*
FROM http_redirects as hr
JOIN site_visits sv ON sv.visit_id = hr.visit_id and sv.browser_id = hr.browser_id
WHERE sv.site_rank = 1""",
)
observed_records = set()
for row in rows:
# TODO: new_request_id isn't supported yet
# src = request_id_to_url[row['old_request_id']].split('?')[0]
# dst = request_id_to_url[row['new_request_id']].split('?')[0]
src = row["old_request_url"].split("?")[0]
dst = row["new_request_url"].split("?")[0]
observed_records.add((src, dst))
assert HTTP_CACHED_REDIRECTS == observed_records
Command refactoring (#750) * Refactored GetCommand, BrowseCommand to have execute method * Fixed type name format issues in __issue_command * Fixed everything I broke * Changed import style so tests can run * Added BrowseCommad to imports * Added some more self * Added logging to explain failing test * Added one more self * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * Ported SaveScreenshotCommand It now uses the new command.execute(...) syntax * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * Ported SaveScreenshotFullPage #763 * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * Ported DumpPageSource and RecursiveDumpPageSource (#767) * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Command refactoring (#770) * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Ran isort * Added append_command * remove custom function command and format code * Refactored GetCommand, BrowseCommand to have execute method * Fixed type name format issues in __issue_command * Fixed everything I broke * Changed import style so tests can run * Added BrowseCommad to imports * Added some more self * Added logging to explain failing test * Added one more self * Ported SaveScreenshotCommand It now uses the new command.execute(...) syntax * Ported SaveScreenshotFullPage #763 * Ported DumpPageSource and RecursiveDumpPageSource (#767) * Command refactoring (#770) * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Ran isort * Added append_command * remove duplicate append_command * Refactored GetCommand, BrowseCommand to have execute method * Fixed type name format issues in __issue_command * Fixed everything I broke * Changed import style so tests can run * Added BrowseCommad to imports * Added some more self * Added logging to explain failing test * Added one more self * Ported SaveScreenshotCommand It now uses the new command.execute(...) syntax * Ported SaveScreenshotFullPage #763 * Ported DumpPageSource and RecursiveDumpPageSource (#767) * Command refactoring (#770) * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Ran isort * Added append_command * generate new xpi * Fixing tests * Fixing tests * Fixing up more tests * Removed type annotations * Fixing tests * Fixing tests * Removed command_executor * Moved Commands to commands * Fixing imports * Fixed skipped test * Removed duplicate append_command * docs: update adding command in usingOpenWPM * Forgot to save * Removed datadir * Cleaning up imports * Implemented simple command * Added documentation to simple_command.py * Renamed to custom_command.py * Moved docs around * Referencing BaseCommand.execute * Update docs/Using_OpenWPM.md Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> Co-authored-by: Cyrus <cyruskarsan@gmail.com> Co-authored-by: cyruskarsan <55566678+cyruskarsan@users.noreply.github.com> Co-authored-by: Steven Englehardt <senglehardt@mozilla.com>
2021-01-09 13:15:01 +03:00
class FilenamesIntoFormCommand(BaseCommand):
Data Aggregator Rewrite (#753) * First steps in the rewrite * Fixed import paths * One giant refactor * Fixing tests * Adding mypy * Removed mypy from pre-commit workflow * First draft on DataAggregator * Wrote a DataAggregator that starts and shuts down * Created tests and added more empty types * Got demo.py working * Created sql_provider * Cleaned up imports in TaskManager * Added async * Fixed minor bugs * First steps at porting arrow * Introduced TableName and different Task handling * Added more failing tests * First first completes others don't * It works * Started working on arrow_provider * Implemented ArrowProvider * Added logger fixture * Fixed test_storage_controller * Fixing OpenWPMTest.visit() * Moved test/storage_providers to test/storage * Fixing up tests * Moved automation to openwpm * Readded datadir to .gitignore * Ran repin.sh * Fixed formatting * Let's see if this works * Fixed imports * Got arrow_memory_provider working * Starting to rewrite tests * Setting up fixtures * Attempting to fix all the tests * Still fixing tests * Broken content saving * Added node * Fixed screenshot tests * Fixing more tests * Fixed tests * Implemented local_storage.py * Cleaned up flush_cache * Fixing more tests * Wrote test for LocalArrowProvider * Introduced tests for local_storage_provider.py * Asserting test dir is empty * Creating subfolder for different aggregators * New depencies and init() * Everything is terribly broken * Figured out finalize_visit_id * Running two event loops kinda works??? * Rearming the event * Introduced mypy * Downgraded black in pre-commit * Modifying the database directly * Fixed formatting * Made mypy a lil stricter * Fixing docs and config printing * Realising I've been using the wrong with * Trying to figure arrow_storage * Moving lock initialization in in_memory_storage * Fixing tests * Fixing up tests and adding more typechecking * Fixed num_browsers in test_cache_hits_recorded * Parametrized unstructured * String fix * Added failing test * New test * Review changes with Steven * Fixed repin.sh and test_arrow_cache * Minor change * Fixed prune-environment.py * Removing references to DataAggregator * Fixed test_seed_persistance * More paths * Fixed test display shutdown * Made cache test more robust * Update crawler.py Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Slimming down ManagerParams * Fixing more tests * Update test/storage/test_storage_controller.py Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Purging references to DataAggregator * Reverted changes to .travis.yml * Demo.py saves locally again * Readjusting test paths * Expanded comment on initialize to reference #846 * Made token optional in finalize_visit_id * Simplified test paramtetrization * Fixed callback semantics change * Removed test_parse_http_stack_trace_str * Added DataSocket * WIP need to fix path encoding * Fixed path encoding * Added task and crawl to schema * Fixed paths in GitHub actions * Refactored completion handling * Fix tests * Trying to fix tests on CI * Removed redundant setting of tag * Removing references to S3 * Purging more DataAggregator references * Craking up logging to figure out test failure * Moved test_values into a fixture * Fixing GcpUnstructuredProvider * Fixed paths for future crawls * Renamed sqllite to official sqlite * Restored demo.py * Update openwpm/commands/profile_commands.py Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Restored previous behaviour of DumpProfileCommand Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Removed leftovers * Cleaned up comments * Expanded lock check * Fixed more stuff * More comment updates * Update openwpm/socket_interface.py Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Removed outdated comment * Using config_encoder * Renamed tar_location to tar_path * Removed references to database_name in docs * Cleanup * Moved screenshot_path and source_dump_path to ManagerParamsInternal * Fixed imports * Fixing up comments * Fixing up comments * More docs * updated dependencies * Fixed test_task_manager * Reupgraded to python 3.9.1 * Restoring crawl_reference in mp_logger * Removed unused imports * Apply suggestions from code review Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Cleaned up socket handling * Fixed TaskManager.__exit__ * Moved validation code into config.py * Removed comment * Removed comment * Removed comment Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com>
2021-02-22 19:51:32 +03:00
def __init__(self, img_file_path: str, css_file_path: str) -> None:
Command refactoring (#750) * Refactored GetCommand, BrowseCommand to have execute method * Fixed type name format issues in __issue_command * Fixed everything I broke * Changed import style so tests can run * Added BrowseCommad to imports * Added some more self * Added logging to explain failing test * Added one more self * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * Ported SaveScreenshotCommand It now uses the new command.execute(...) syntax * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * Ported SaveScreenshotFullPage #763 * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * Ported DumpPageSource and RecursiveDumpPageSource (#767) * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Command refactoring (#770) * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Ran isort * Added append_command * remove custom function command and format code * Refactored GetCommand, BrowseCommand to have execute method * Fixed type name format issues in __issue_command * Fixed everything I broke * Changed import style so tests can run * Added BrowseCommad to imports * Added some more self * Added logging to explain failing test * Added one more self * Ported SaveScreenshotCommand It now uses the new command.execute(...) syntax * Ported SaveScreenshotFullPage #763 * Ported DumpPageSource and RecursiveDumpPageSource (#767) * Command refactoring (#770) * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Ran isort * Added append_command * remove duplicate append_command * Refactored GetCommand, BrowseCommand to have execute method * Fixed type name format issues in __issue_command * Fixed everything I broke * Changed import style so tests can run * Added BrowseCommad to imports * Added some more self * Added logging to explain failing test * Added one more self * Ported SaveScreenshotCommand It now uses the new command.execute(...) syntax * Ported SaveScreenshotFullPage #763 * Ported DumpPageSource and RecursiveDumpPageSource (#767) * Command refactoring (#770) * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Ran isort * Added append_command * generate new xpi * Fixing tests * Fixing tests * Fixing up more tests * Removed type annotations * Fixing tests * Fixing tests * Removed command_executor * Moved Commands to commands * Fixing imports * Fixed skipped test * Removed duplicate append_command * docs: update adding command in usingOpenWPM * Forgot to save * Removed datadir * Cleaning up imports * Implemented simple command * Added documentation to simple_command.py * Renamed to custom_command.py * Moved docs around * Referencing BaseCommand.execute * Update docs/Using_OpenWPM.md Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> Co-authored-by: Cyrus <cyruskarsan@gmail.com> Co-authored-by: cyruskarsan <55566678+cyruskarsan@users.noreply.github.com> Co-authored-by: Steven Englehardt <senglehardt@mozilla.com>
2021-01-09 13:15:01 +03:00
self.img_file_path = img_file_path
self.css_file_path = css_file_path
def execute(
self,
webdriver,
browser_params,
manager_params,
extension_socket,
Data Aggregator Rewrite (#753) * First steps in the rewrite * Fixed import paths * One giant refactor * Fixing tests * Adding mypy * Removed mypy from pre-commit workflow * First draft on DataAggregator * Wrote a DataAggregator that starts and shuts down * Created tests and added more empty types * Got demo.py working * Created sql_provider * Cleaned up imports in TaskManager * Added async * Fixed minor bugs * First steps at porting arrow * Introduced TableName and different Task handling * Added more failing tests * First first completes others don't * It works * Started working on arrow_provider * Implemented ArrowProvider * Added logger fixture * Fixed test_storage_controller * Fixing OpenWPMTest.visit() * Moved test/storage_providers to test/storage * Fixing up tests * Moved automation to openwpm * Readded datadir to .gitignore * Ran repin.sh * Fixed formatting * Let's see if this works * Fixed imports * Got arrow_memory_provider working * Starting to rewrite tests * Setting up fixtures * Attempting to fix all the tests * Still fixing tests * Broken content saving * Added node * Fixed screenshot tests * Fixing more tests * Fixed tests * Implemented local_storage.py * Cleaned up flush_cache * Fixing more tests * Wrote test for LocalArrowProvider * Introduced tests for local_storage_provider.py * Asserting test dir is empty * Creating subfolder for different aggregators * New depencies and init() * Everything is terribly broken * Figured out finalize_visit_id * Running two event loops kinda works??? * Rearming the event * Introduced mypy * Downgraded black in pre-commit * Modifying the database directly * Fixed formatting * Made mypy a lil stricter * Fixing docs and config printing * Realising I've been using the wrong with * Trying to figure arrow_storage * Moving lock initialization in in_memory_storage * Fixing tests * Fixing up tests and adding more typechecking * Fixed num_browsers in test_cache_hits_recorded * Parametrized unstructured * String fix * Added failing test * New test * Review changes with Steven * Fixed repin.sh and test_arrow_cache * Minor change * Fixed prune-environment.py * Removing references to DataAggregator * Fixed test_seed_persistance * More paths * Fixed test display shutdown * Made cache test more robust * Update crawler.py Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Slimming down ManagerParams * Fixing more tests * Update test/storage/test_storage_controller.py Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Purging references to DataAggregator * Reverted changes to .travis.yml * Demo.py saves locally again * Readjusting test paths * Expanded comment on initialize to reference #846 * Made token optional in finalize_visit_id * Simplified test paramtetrization * Fixed callback semantics change * Removed test_parse_http_stack_trace_str * Added DataSocket * WIP need to fix path encoding * Fixed path encoding * Added task and crawl to schema * Fixed paths in GitHub actions * Refactored completion handling * Fix tests * Trying to fix tests on CI * Removed redundant setting of tag * Removing references to S3 * Purging more DataAggregator references * Craking up logging to figure out test failure * Moved test_values into a fixture * Fixing GcpUnstructuredProvider * Fixed paths for future crawls * Renamed sqllite to official sqlite * Restored demo.py * Update openwpm/commands/profile_commands.py Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Restored previous behaviour of DumpProfileCommand Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Removed leftovers * Cleaned up comments * Expanded lock check * Fixed more stuff * More comment updates * Update openwpm/socket_interface.py Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com> * Removed outdated comment * Using config_encoder * Renamed tar_location to tar_path * Removed references to database_name in docs * Cleanup * Moved screenshot_path and source_dump_path to ManagerParamsInternal * Fixed imports * Fixing up comments * Fixing up comments * More docs * updated dependencies * Fixed test_task_manager * Reupgraded to python 3.9.1 * Restoring crawl_reference in mp_logger * Removed unused imports * Apply suggestions from code review Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> * Cleaned up socket handling * Fixed TaskManager.__exit__ * Moved validation code into config.py * Removed comment * Removed comment * Removed comment Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com>
2021-02-22 19:51:32 +03:00
):
Command refactoring (#750) * Refactored GetCommand, BrowseCommand to have execute method * Fixed type name format issues in __issue_command * Fixed everything I broke * Changed import style so tests can run * Added BrowseCommad to imports * Added some more self * Added logging to explain failing test * Added one more self * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * Ported SaveScreenshotCommand It now uses the new command.execute(...) syntax * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * Ported SaveScreenshotFullPage #763 * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * Ported DumpPageSource and RecursiveDumpPageSource (#767) * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Command refactoring (#770) * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Ran isort * Added append_command * remove custom function command and format code * Refactored GetCommand, BrowseCommand to have execute method * Fixed type name format issues in __issue_command * Fixed everything I broke * Changed import style so tests can run * Added BrowseCommad to imports * Added some more self * Added logging to explain failing test * Added one more self * Ported SaveScreenshotCommand It now uses the new command.execute(...) syntax * Ported SaveScreenshotFullPage #763 * Ported DumpPageSource and RecursiveDumpPageSource (#767) * Command refactoring (#770) * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Ran isort * Added append_command * remove duplicate append_command * Refactored GetCommand, BrowseCommand to have execute method * Fixed type name format issues in __issue_command * Fixed everything I broke * Changed import style so tests can run * Added BrowseCommad to imports * Added some more self * Added logging to explain failing test * Added one more self * Ported SaveScreenshotCommand It now uses the new command.execute(...) syntax * Ported SaveScreenshotFullPage #763 * Ported DumpPageSource and RecursiveDumpPageSource (#767) * Command refactoring (#770) * attempt at refactoring save_screenshot * fixed indentation, attempt at refactoring save_screenshot * refactored SaveScreenshot command to have execute method * reformatted code using black * refactored savefullscreenshot command to follow command sequence * formatted files with black * removed extraneous commands * refactored dump page source and formatted code with black * reformatted recursive dump page source command and formatted code w black * formatted files using isort * formatted all files with isort * refactor finalize command * refactored initalize command and formatted with black and isort * missed a conflict * Ran isort * Added append_command * generate new xpi * Fixing tests * Fixing tests * Fixing up more tests * Removed type annotations * Fixing tests * Fixing tests * Removed command_executor * Moved Commands to commands * Fixing imports * Fixed skipped test * Removed duplicate append_command * docs: update adding command in usingOpenWPM * Forgot to save * Removed datadir * Cleaning up imports * Implemented simple command * Added documentation to simple_command.py * Renamed to custom_command.py * Moved docs around * Referencing BaseCommand.execute * Update docs/Using_OpenWPM.md Co-authored-by: Steven Englehardt <senglehardt@mozilla.com> Co-authored-by: Cyrus <cyruskarsan@gmail.com> Co-authored-by: cyruskarsan <55566678+cyruskarsan@users.noreply.github.com> Co-authored-by: Steven Englehardt <senglehardt@mozilla.com>
2021-01-09 13:15:01 +03:00
img_file_upload_element = webdriver.find_element_by_id("upload-img")
css_file_upload_element = webdriver.find_element_by_id("upload-css")
img_file_upload_element.send_keys(self.img_file_path)
css_file_upload_element.send_keys(self.css_file_path)
sleep(5) # wait for the form submission (3 sec after onload)